Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-30 Thread Robert Li (baoli)
Ian,

I hope that you guys are in agreement on this. But take a look at the wiki: 
https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support and see if it has 
any difference from your proposals.  IMO, it's the critical piece of the 
proposal, and hasn't been specified in exact term yet. I'm not sure about 
vif_attributes or vif_stats, which I just heard from you. In any case, I'm not 
convinced with the flexibility and/or complexity, and so far I haven't seen a 
use case that really demands it. But I'd be happy to see one.

thanks,
Robert

On 1/29/14 4:43 PM, Ian Wells 
ijw.ubu...@cack.org.ukmailto:ijw.ubu...@cack.org.uk wrote:

My proposals:

On 29 January 2014 16:43, Robert Li (baoli) 
ba...@cisco.commailto:ba...@cisco.com wrote:
1. pci-flavor-attrs is configured through configuration files and will be
available on both the controller node and the compute nodes. Can the cloud
admin decide to add a new attribute in a running cloud? If that's
possible, how is that done?

When nova-compute starts up, it requests the VIF attributes that the schedulers 
need.  (You could have multiple schedulers; they could be in disagreement; it 
picks the last answer.)  It returns pci_stats by the selected combination of 
VIF attributes.

When nova-scheduler starts up, it sends an unsolicited cast of the attributes.  
nova-compute updates the attributes, clears its pci_stats and recreates them.

If nova-scheduler receives pci_stats with incorrect attributes it discards them.

(There is a row from nova-compute summarising devices for each unique 
combination of vif_stats, including 'None' where no attribute is set.)

I'm assuming here that the pci_flavor_attrs are read on startup of 
nova-scheduler and could be re-read and different when nova-scheduler is reset. 
 There's a relatively straightforward move from here to an API for setting it 
if this turns out to be useful, but firstly I think it would be an uncommon 
occurrence and secondly it's not something we should implement now.

2. PCI flavor will be defined using the attributes in pci-flavor-attrs. A
flavor is defined with a matching expression in the form of attr1 = val11
[| val12 Š.], [attr2 = val21 [| val22 Š]], Š. And this expression is used
to match one or more PCI stats groups until a free PCI device is located.
In this case, both attr1 and attr2 can have multiple values, and both
attributes need to be satisfied. Please confirm this understanding is
correct

This looks right to me as we've discussed it, but I think we'll be wanting 
something that allows a top level AND.  In the above example, I can't say an 
Intel NIC and a Mellanox NIC are equally OK, because I can't say (intel + 
product ID 1) AND (Mellanox + product ID 2).  I'll leave Yunhong to decice how 
the details should look, though.

3. I'd like to see an example that involves multiple attributes. let's say
pci-flavor-attrs = {gpu, net-group, device_id, product_id}. I'd like to
know how PCI stats groups are formed on compute nodes based on that, and
how many of PCI stats groups are there? What's the reasonable guidelines
in defining the PCI flavors.

I need to write up the document for this, and it's overdue.  Leave it with me.
--
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-29 Thread Robert Li (baoli)
Hi Yongli,

Thank you for addressing my comments, and for adding the encryption card
use case. One thing that I want to point out is that in this use case, you
may not use the pci-flavor in the --nic option because it's not a neutron
feature.

I have a few more questions:
1. pci-flavor-attrs is configured through configuration files and will be
available on both the controller node and the compute nodes. Can the cloud
admin decide to add a new attribute in a running cloud? If that's
possible, how is that done?
2. PCI flavor will be defined using the attributes in pci-flavor-attrs. A
flavor is defined with a matching expression in the form of attr1 = val11
[| val12 Š.], [attr2 = val21 [| val22 Š]], Š. And this expression is used
to match one or more PCI stats groups until a free PCI device is located.
In this case, both attr1 and attr2 can have multiple values, and both
attributes need to be satisfied. Please confirm this understanding is
correct
3. I'd like to see an example that involves multiple attributes. let's say
pci-flavor-attrs = {gpu, net-group, device_id, product_id}. I'd like to
know how PCI stats groups are formed on compute nodes based on that, and
how many of PCI stats groups are there? What's the reasonable guidelines
in defining the PCI flavors.


thanks,
Robert



On 1/28/14 10:16 PM, Robert Li (baoli) ba...@cisco.com wrote:

Hi,

I added a few comments in this wiki that Yongli came up with:
https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support

Please check it out and look for Robert in the wiki.

Thanks,
Robert

On 1/21/14 9:55 AM, Robert Li (baoli) ba...@cisco.com wrote:

Yunhong, 

Just try to understand your use case:
-- a VM can only work with cards from vendor V1
-- a VM can work with cards from both vendor V1 and V2

  So stats in the two flavors will overlap in the PCI flavor
solution.
I'm just trying to say that this is something that needs to be properly
addressed.


Just for the sake of discussion, another solution to meeting the above
requirement is able to say in the nova flavor's extra-spec

   encrypt_card = card from vendor V1 OR encrypt_card = card from
vendor V2


In other words, this can be solved in the nova flavor, rather than
introducing a new flavor.

Thanks,
Robert
   

On 1/17/14 7:03 PM, yunhong jiang yunhong.ji...@linux.intel.com
wrote:

On Fri, 2014-01-17 at 22:30 +, Robert Li (baoli) wrote:
 Yunhong,
 
 I'm hoping that these comments can be directly addressed:
   a practical deployment scenario that requires arbitrary
 attributes.

I'm just strongly against to support only one attributes (your PCI
group) for scheduling and management, that's really TOO limited.

A simple scenario is, I have 3 encryption card:
 Card 1 (vendor_id is V1, device_id =0xa)
 card 2(vendor_id is V1, device_id=0xb)
 card 3(vendor_id is V2, device_id=0xb)

 I have two images. One image only support Card 1 and another image
support Card 1/3 (or any other combination of the 3 card type). I don't
only one attributes will meet such requirement.

As to arbitrary attributes or limited list of attributes, my opinion is,
as there are so many type of PCI devices and so many potential of PCI
devices usage, support arbitrary attributes will make our effort more
flexible, if we can push the implementation into the tree.

   detailed design on the following (that also take into account
 the
 introduction of predefined attributes):
 * PCI stats report since the scheduler is stats based

I don't think there are much difference with current implementation.

 * the scheduler in support of PCI flavors with arbitrary
 attributes and potential overlapping.

As Ian said, we need make sure the pci_stats and the PCI flavor have the
same set of attributes, so I don't think there are much difference with
current implementation.

   networking requirements to support multiple provider
 nets/physical
 nets

Can't the extra info resolve this issue? Can you elaborate the issue?

Thanks
--jyh
 
 I guess that the above will become clear as the discussion goes on.
 And we
 also need to define the deliveries
  
 Thanks,
 Robert 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-29 Thread Ian Wells
My proposals:

On 29 January 2014 16:43, Robert Li (baoli) ba...@cisco.com wrote:

 1. pci-flavor-attrs is configured through configuration files and will be
 available on both the controller node and the compute nodes. Can the cloud
 admin decide to add a new attribute in a running cloud? If that's
 possible, how is that done?


When nova-compute starts up, it requests the VIF attributes that the
schedulers need.  (You could have multiple schedulers; they could be in
disagreement; it picks the last answer.)  It returns pci_stats by the
selected combination of VIF attributes.

When nova-scheduler starts up, it sends an unsolicited cast of the
attributes.  nova-compute updates the attributes, clears its pci_stats and
recreates them.

If nova-scheduler receives pci_stats with incorrect attributes it discards
them.

(There is a row from nova-compute summarising devices for each unique
combination of vif_stats, including 'None' where no attribute is set.)

I'm assuming here that the pci_flavor_attrs are read on startup of
nova-scheduler and could be re-read and different when nova-scheduler is
reset.  There's a relatively straightforward move from here to an API for
setting it if this turns out to be useful, but firstly I think it would be
an uncommon occurrence and secondly it's not something we should implement
now.

2. PCI flavor will be defined using the attributes in pci-flavor-attrs. A
 flavor is defined with a matching expression in the form of attr1 = val11
 [| val12 Š.], [attr2 = val21 [| val22 Š]], Š. And this expression is used
 to match one or more PCI stats groups until a free PCI device is located.
 In this case, both attr1 and attr2 can have multiple values, and both
 attributes need to be satisfied. Please confirm this understanding is
 correct


This looks right to me as we've discussed it, but I think we'll be wanting
something that allows a top level AND.  In the above example, I can't say
an Intel NIC and a Mellanox NIC are equally OK, because I can't say (intel
+ product ID 1) AND (Mellanox + product ID 2).  I'll leave Yunhong to
decice how the details should look, though.

3. I'd like to see an example that involves multiple attributes. let's say
 pci-flavor-attrs = {gpu, net-group, device_id, product_id}. I'd like to
 know how PCI stats groups are formed on compute nodes based on that, and
 how many of PCI stats groups are there? What's the reasonable guidelines
 in defining the PCI flavors.


I need to write up the document for this, and it's overdue.  Leave it with
me.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-21 Thread Ian Wells
Document updated to talk about network aware scheduling (
https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit#-
section just before the use case list).

Based on yesterday's meeting, rkukura would also like to see network-aware
scheduling to work for non-PCI cases - where servers are not necessarily
connected to every physical segment and machines therefore need placing
based on where they can reach the networks they need.  I think this is an
exact parallel to the PCI case, except that we're also constrained by a
count of resources (you can connect an infinite number of VMs to a software
bridge, of course).  We should implement the scheduling changes as a
separate batch of work that solves both problems, if we can - and this
works with the two step approach, because step 1 brings us up to Neutron
parity and step 2 will add network-aware scheduling for both PCI and
non-PCI cases.

-- 
Ian.


On 20 January 2014 13:38, Ian Wells ijw.ubu...@cack.org.uk wrote:

 On 20 January 2014 09:28, Irena Berezovsky ire...@mellanox.com wrote:

 Hi,
 Having post PCI meeting discussion with Ian based on his proposal
 https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#
 ,
 I am  not sure that the case that quite usable for SR-IOV based
 networking is covered well by this proposal. The understanding I got is
 that VM can land on the Host that will lack suitable PCI resource.


 The issue we have is if we have multiple underlying networks in the system
 and only some Neutron networks are trunked on the network that the PCI
 device is attached to.  This can specifically happen in the case of
 provider versus trunk networks, though it's very dependent on the setup of
 your system.

 The issue is that, in the design we have, Neutron at present has no input
 into scheduling, and also that all devices in a flavor are precisely
 equivalent.  So if I say 'I want a 10G card attached to network X' I will
 get one of the cases in the 10G flavor with no regard as to whether it can
 actually attach to network X.

 I can see two options here:

 1. What I'd do right now is I would make it so that a VM that is given an
 unsuitable network card fails to run in nova-compute when Neutorn discovers
 it can't attach the PCI device to the network.  This will get us a lot of
 use cases and a Neutron driver without solving the problem elegantly.
 You'd need to choose e.g. a provider or tenant network flavor, mindful of
 the network you're connecting to, so that Neutron can actually succeed,
 which is more visibility into the workings of Neutron than the user really
 ought to need.

 2. When Nova checks that all the networks exist - which, conveniently, is
 in nova-api - it also gets attributes from the networks that can be used by
 the scheduler to choose a device.  So the scheduler chooses from a flavor
 *and*, within that flavor, from a subset of those devices with appopriate
 connectivity.  If we do this then the Neutron connection code doesn't
 change - it should still fail if the connection can't be made - but it
 becomes an internal error, since it's now an issue of consistency of
 setup.

 To do this, I think we would tell Neutron 'PCI extra-info X should be set
 to Y for this provider network and Z for tenant networks' - the precise
 implementation would be somewhat up to the driver - and then add the
 additional check in the scheduler.  The scheduling attributes list would
 have to include that attribute.

 Can you please provide an example for the required cloud admin PCI related
 configurations on nova-compute and controller node with regards to the
 following simplified scenario:
  -- There are 2 provider networks (phy1, phy2), each one has associated
 range on vlan-ids
  -- Each compute node has 2 vendor adapters with SR-IOV  enabled feature,
 exposing xx Virtual Functions.
  -- Every VM vnic on virtual network on provider network  phy1 or phy2
  should be pci pass-through vnic.


 So, we would configure Neutron to check the 'e.physical_network' attribute
 on connection and to return it as a requirement on networks.  Any PCI on
 provider network 'phy1' would be tagged e.physical_network = 'phy1'.  When
 returning the network, an extra attribute would be supplied (perhaps
 something like 'pci_requirements = { e.physical_network = 'phy1'}'.  And
 nova-api would know that, in the case of macvtap and PCI directmap, it
 would need to pass this additional information to the scheduler which would
 need to make use of it in finding a device, over and above the flavor
 requirements.

 Neutron, when mapping a PCI port, would similarly work out from the
 Neutron network the trunk it needs to connect to, and would reject any
 mapping that didn't conform. If it did, it would work out how to
 encapsulate the traffic from the PCI device and set that up on the PF of
 the port.

 I'm not saying this is the only or best solution, but it does have the
 advantage that it keeps all of 

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-21 Thread Robert Li (baoli)
Just one comment:
  The devices allocated for an instance are immediately known after
the domain is created. Therefore it's possible to do a port update and
have the device configured while the instance is booting.

--Robert

On 1/19/14 2:15 AM, Irena Berezovsky ire...@mellanox.com wrote:

Hi Robert, Yonhong,
Although network XML solution (option 1) is very elegant, it has one
major disadvantage. As Robert mentioned, the disadvantage of the network
XML is the inability to know what SR-IOV PCI device was actually
allocated. When neutron is responsible to set networking configuration,
manage admin status, set security groups, it should be able to identify
the SR-IOV PCI device to apply configuration. Within current libvirt
Network XML implementation, it does not seem possible.
Between option (2) and (3), I do not have any preference, it should be as
simple as possible.
Option (3) that I raised can be achieved by renaming the network
interface of Virtual Function via 'ip link set  name'. Interface logical
name can be based on neutron port UUID. This will  allow neutron to
discover devices, if backend plugin requires it. Once VM is migrating,
suitable Virtual Function on the target node should be allocated, and
then its corresponding network interface should be renamed to same
logical name. This can be done without system rebooting. Still need to
check how the Virtual Function corresponding network interface can be
returned to its original name once is not used anymore as VM vNIC.

Regards,
Irena 

-Original Message-
From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Friday, January 17, 2014 9:06 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
support

Robert, thanks for your long reply. Personally I'd prefer option 2/3 as
it keep Nova the only entity for PCI management.

Glad you are ok with Ian's proposal and we have solution to resolve the
libvirt network scenario in that framework.

Thanks
--jyh

 -Original Message-
 From: Robert Li (baoli) [mailto:ba...@cisco.com]
 Sent: Friday, January 17, 2014 7:08 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support
 
 Yunhong,
 
 Thank you for bringing that up on the live migration support. In
 addition to the two solutions you mentioned, Irena has a different
 solution. Let me put all the them here again:
 1. network xml/group based solution.
In this solution, each host that supports a provider
 net/physical net can define a SRIOV group (it's hard to avoid the term
 as you can see from the suggestion you made based on the PCI flavor
 proposal). For each SRIOV group supported on a compute node, A network
 XML will be created the first time the nova compute service is running
 on that node.
 * nova will conduct scheduling, but not PCI device allocation
 * it's a simple and clean solution, documented in libvirt as
 the way to support live migration with SRIOV. In addition, a network
 xml is nicely mapped into a provider net.
 2. network xml per PCI device based solution
This is the solution you brought up in this email, and Ian
 mentioned this to me as well. In this solution, a network xml is
 created when A VM is created. the network xml needs to be removed once
 the VM is removed. This hasn't been tried out as far as I  know.
 3. interface xml/interface rename based solution
Irena brought this up. In this solution, the ethernet interface
 name corresponding to the PCI device attached to the VM needs to be
 renamed. One way to do so without requiring system reboot is to change
 the udev rule's file for interface renaming, followed by a udev
 reload.
 
 Now, with the first solution, Nova doesn't seem to have control over
 or visibility of the PCI device allocated for the VM before the VM is
 launched. This needs to be confirmed with the libvirt support and see
 if such capability can be provided. This may be a potential drawback
 if a neutron plugin requires detailed PCI device information for
operation.
 Irena may provide more insight into this. Ideally, neutron shouldn't
 need this information because the device configuration can be done by
 libvirt invoking the PCI device driver.
 
 The other two solutions are similar. For example, you can view the
 second solution as one way to rename an interface, or camouflage an
 interface under a network name. They all require additional works
 before the VM is created and after the VM is removed.
 
 I also agree with you that we should take a look at XenAPI on this.
 
 
 With regard to your suggestion on how to implement the first solution
 with some predefined group attribute, I think it definitely can be
 done. As I have pointed it out earlier, the PCI flavor proposal is
 actually a generalized version of the PCI group. In other words, in
 the PCI group

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-21 Thread Robert Li (baoli)
Yunhong, 

Just try to understand your use case:
-- a VM can only work with cards from vendor V1
-- a VM can work with cards from both vendor V1 and V2

  So stats in the two flavors will overlap in the PCI flavor solution.
I'm just trying to say that this is something that needs to be properly
addressed.


Just for the sake of discussion, another solution to meeting the above
requirement is able to say in the nova flavor's extra-spec

   encrypt_card = card from vendor V1 OR encrypt_card = card from
vendor V2


In other words, this can be solved in the nova flavor, rather than
introducing a new flavor.

Thanks,
Robert
   

On 1/17/14 7:03 PM, yunhong jiang yunhong.ji...@linux.intel.com wrote:

On Fri, 2014-01-17 at 22:30 +, Robert Li (baoli) wrote:
 Yunhong,
 
 I'm hoping that these comments can be directly addressed:
   a practical deployment scenario that requires arbitrary
 attributes.

I'm just strongly against to support only one attributes (your PCI
group) for scheduling and management, that's really TOO limited.

A simple scenario is, I have 3 encryption card:
   Card 1 (vendor_id is V1, device_id =0xa)
   card 2(vendor_id is V1, device_id=0xb)
   card 3(vendor_id is V2, device_id=0xb)

   I have two images. One image only support Card 1 and another image
support Card 1/3 (or any other combination of the 3 card type). I don't
only one attributes will meet such requirement.

As to arbitrary attributes or limited list of attributes, my opinion is,
as there are so many type of PCI devices and so many potential of PCI
devices usage, support arbitrary attributes will make our effort more
flexible, if we can push the implementation into the tree.

   detailed design on the following (that also take into account
 the
 introduction of predefined attributes):
 * PCI stats report since the scheduler is stats based

I don't think there are much difference with current implementation.

 * the scheduler in support of PCI flavors with arbitrary
 attributes and potential overlapping.

As Ian said, we need make sure the pci_stats and the PCI flavor have the
same set of attributes, so I don't think there are much difference with
current implementation.

   networking requirements to support multiple provider
 nets/physical
 nets

Can't the extra info resolve this issue? Can you elaborate the issue?

Thanks
--jyh
 
 I guess that the above will become clear as the discussion goes on.
 And we
 also need to define the deliveries
  
 Thanks,
 Robert 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-20 Thread Irena Berezovsky
Hi,
Having post PCI meeting discussion with Ian based on his proposal 
https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#,
I am  not sure that the case that quite usable for SR-IOV based networking is 
covered well by this proposal. The understanding I got is that VM can land on 
the Host that will lack suitable PCI resource.
Can you please provide an example for the required cloud admin PCI related 
configurations on nova-compute and controller node with regards to the 
following simplified scenario:
 -- There are 2 provider networks (phy1, phy2), each one has associated range 
on vlan-ids
 -- Each compute node has 2 vendor adapters with SR-IOV  enabled feature, 
exposing xx Virtual Functions.
 -- Every VM vnic on virtual network on provider network  phy1 or phy2  should 
be pci pass-through vnic. 

Thanks a lot,
Irena

-Original Message-
From: Robert Li (baoli) [mailto:ba...@cisco.com] 
Sent: Saturday, January 18, 2014 12:33 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Yunhong,

I'm hoping that these comments can be directly addressed:
  a practical deployment scenario that requires arbitrary attributes.
  detailed design on the following (that also take into account the 
introduction of predefined attributes):
* PCI stats report since the scheduler is stats based
* the scheduler in support of PCI flavors with arbitrary attributes and 
potential overlapping.
  networking requirements to support multiple provider nets/physical nets

I guess that the above will become clear as the discussion goes on. And we also 
need to define the deliveries
 
Thanks,
Robert

On 1/17/14 2:02 PM, Jiang, Yunhong yunhong.ji...@intel.com wrote:

Robert, thanks for your long reply. Personally I'd prefer option 2/3 as 
it keep Nova the only entity for PCI management.

Glad you are ok with Ian's proposal and we have solution to resolve the 
libvirt network scenario in that framework.

Thanks
--jyh

 -Original Message-
 From: Robert Li (baoli) [mailto:ba...@cisco.com]
 Sent: Friday, January 17, 2014 7:08 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through 
 network support
 
 Yunhong,
 
 Thank you for bringing that up on the live migration support. In 
addition  to the two solutions you mentioned, Irena has a different 
solution. Let me  put all the them here again:
 1. network xml/group based solution.
In this solution, each host that supports a provider 
net/physical  net can define a SRIOV group (it's hard to avoid the 
term as you can see  from the suggestion you made based on the PCI 
flavor proposal). For each  SRIOV group supported on a compute node, A 
network XML will be  created the  first time the nova compute service 
is running on that node.
 * nova will conduct scheduling, but not PCI device allocation
 * it's a simple and clean solution, documented in libvirt as 
the  way to support live migration with SRIOV. In addition, a network 
xml is  nicely mapped into a provider net.
 2. network xml per PCI device based solution
This is the solution you brought up in this email, and Ian  
mentioned this to me as well. In this solution, a network xml is 
created  when A VM is created. the network xml needs to be removed 
once the  VM is  removed. This hasn't been tried out as far as I  
know.
 3. interface xml/interface rename based solution
Irena brought this up. In this solution, the ethernet 
interface  name corresponding to the PCI device attached to the VM 
needs to be  renamed. One way to do so without requiring system reboot 
is to change  the  udev rule's file for interface renaming, followed 
by a udev reload.
 
 Now, with the first solution, Nova doesn't seem to have control over 
or  visibility of the PCI device allocated for the VM before the VM is  
launched. This needs to be confirmed with the libvirt support and see 
if  such capability can be provided. This may be a potential drawback 
if a  neutron plugin requires detailed PCI device information for operation.
 Irena may provide more insight into this. Ideally, neutron shouldn't 
need  this information because the device configuration can be done by 
libvirt  invoking the PCI device driver.
 
 The other two solutions are similar. For example, you can view the 
second  solution as one way to rename an interface, or camouflage an 
interface  under a network name. They all require additional works 
before the VM is  created and after the VM is removed.
 
 I also agree with you that we should take a look at XenAPI on this.
 
 
 With regard to your suggestion on how to implement the first solution 
with  some predefined group attribute, I think it definitely can be 
done. As I  have pointed it out earlier, the PCI flavor proposal is 
actually

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-20 Thread Ian Wells
On 20 January 2014 09:28, Irena Berezovsky ire...@mellanox.com wrote:

 Hi,
 Having post PCI meeting discussion with Ian based on his proposal
 https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#
 ,
 I am  not sure that the case that quite usable for SR-IOV based networking
 is covered well by this proposal. The understanding I got is that VM can
 land on the Host that will lack suitable PCI resource.


The issue we have is if we have multiple underlying networks in the system
and only some Neutron networks are trunked on the network that the PCI
device is attached to.  This can specifically happen in the case of
provider versus trunk networks, though it's very dependent on the setup of
your system.

The issue is that, in the design we have, Neutron at present has no input
into scheduling, and also that all devices in a flavor are precisely
equivalent.  So if I say 'I want a 10G card attached to network X' I will
get one of the cases in the 10G flavor with no regard as to whether it can
actually attach to network X.

I can see two options here:

1. What I'd do right now is I would make it so that a VM that is given an
unsuitable network card fails to run in nova-compute when Neutorn discovers
it can't attach the PCI device to the network.  This will get us a lot of
use cases and a Neutron driver without solving the problem elegantly.
You'd need to choose e.g. a provider or tenant network flavor, mindful of
the network you're connecting to, so that Neutron can actually succeed,
which is more visibility into the workings of Neutron than the user really
ought to need.

2. When Nova checks that all the networks exist - which, conveniently, is
in nova-api - it also gets attributes from the networks that can be used by
the scheduler to choose a device.  So the scheduler chooses from a flavor
*and*, within that flavor, from a subset of those devices with appopriate
connectivity.  If we do this then the Neutron connection code doesn't
change - it should still fail if the connection can't be made - but it
becomes an internal error, since it's now an issue of consistency of
setup.

To do this, I think we would tell Neutron 'PCI extra-info X should be set
to Y for this provider network and Z for tenant networks' - the precise
implementation would be somewhat up to the driver - and then add the
additional check in the scheduler.  The scheduling attributes list would
have to include that attribute.

Can you please provide an example for the required cloud admin PCI related
 configurations on nova-compute and controller node with regards to the
 following simplified scenario:
  -- There are 2 provider networks (phy1, phy2), each one has associated
 range on vlan-ids
  -- Each compute node has 2 vendor adapters with SR-IOV  enabled feature,
 exposing xx Virtual Functions.
  -- Every VM vnic on virtual network on provider network  phy1 or phy2
  should be pci pass-through vnic.


So, we would configure Neutron to check the 'e.physical_network' attribute
on connection and to return it as a requirement on networks.  Any PCI on
provider network 'phy1' would be tagged e.physical_network = 'phy1'.  When
returning the network, an extra attribute would be supplied (perhaps
something like 'pci_requirements = { e.physical_network = 'phy1'}'.  And
nova-api would know that, in the case of macvtap and PCI directmap, it
would need to pass this additional information to the scheduler which would
need to make use of it in finding a device, over and above the flavor
requirements.

Neutron, when mapping a PCI port, would similarly work out from the Neutron
network the trunk it needs to connect to, and would reject any mapping that
didn't conform. If it did, it would work out how to encapsulate the traffic
from the PCI device and set that up on the PF of the port.

I'm not saying this is the only or best solution, but it does have the
advantage that it keeps all of the networking behaviour in Neutron -
hopefully Nova remains almost completely ignorant of what the network setup
is, since the only thing we have to do is pass on PCI requirements, and we
already have a convenient call flow we can use that's there for the network
existence check.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-18 Thread Irena Berezovsky
Hi Robert, Yonhong,
Although network XML solution (option 1) is very elegant, it has one major 
disadvantage. As Robert mentioned, the disadvantage of the network XML is the 
inability to know what SR-IOV PCI device was actually allocated. When neutron 
is responsible to set networking configuration, manage admin status, set 
security groups, it should be able to identify the SR-IOV PCI device to apply 
configuration. Within current libvirt Network XML implementation, it does not 
seem possible.
Between option (2) and (3), I do not have any preference, it should be as 
simple as possible.
Option (3) that I raised can be achieved by renaming the network interface of 
Virtual Function via 'ip link set  name'. Interface logical name can be based 
on neutron port UUID. This will  allow neutron to discover devices, if backend 
plugin requires it. Once VM is migrating, suitable Virtual Function on the 
target node should be allocated, and then its corresponding network interface 
should be renamed to same logical name. This can be done without system 
rebooting. Still need to check how the Virtual Function corresponding network 
interface can be returned to its original name once is not used anymore as VM 
vNIC.

Regards,
Irena 

-Original Message-
From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com] 
Sent: Friday, January 17, 2014 9:06 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Robert, thanks for your long reply. Personally I'd prefer option 2/3 as it keep 
Nova the only entity for PCI management.

Glad you are ok with Ian's proposal and we have solution to resolve the libvirt 
network scenario in that framework.

Thanks
--jyh

 -Original Message-
 From: Robert Li (baoli) [mailto:ba...@cisco.com]
 Sent: Friday, January 17, 2014 7:08 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network 
 support
 
 Yunhong,
 
 Thank you for bringing that up on the live migration support. In 
 addition to the two solutions you mentioned, Irena has a different 
 solution. Let me put all the them here again:
 1. network xml/group based solution.
In this solution, each host that supports a provider 
 net/physical net can define a SRIOV group (it's hard to avoid the term 
 as you can see from the suggestion you made based on the PCI flavor 
 proposal). For each SRIOV group supported on a compute node, A network 
 XML will be created the first time the nova compute service is running 
 on that node.
 * nova will conduct scheduling, but not PCI device allocation
 * it's a simple and clean solution, documented in libvirt as 
 the way to support live migration with SRIOV. In addition, a network 
 xml is nicely mapped into a provider net.
 2. network xml per PCI device based solution
This is the solution you brought up in this email, and Ian 
 mentioned this to me as well. In this solution, a network xml is 
 created when A VM is created. the network xml needs to be removed once 
 the VM is removed. This hasn't been tried out as far as I  know.
 3. interface xml/interface rename based solution
Irena brought this up. In this solution, the ethernet interface 
 name corresponding to the PCI device attached to the VM needs to be 
 renamed. One way to do so without requiring system reboot is to change 
 the udev rule's file for interface renaming, followed by a udev 
 reload.
 
 Now, with the first solution, Nova doesn't seem to have control over 
 or visibility of the PCI device allocated for the VM before the VM is 
 launched. This needs to be confirmed with the libvirt support and see 
 if such capability can be provided. This may be a potential drawback 
 if a neutron plugin requires detailed PCI device information for operation.
 Irena may provide more insight into this. Ideally, neutron shouldn't 
 need this information because the device configuration can be done by 
 libvirt invoking the PCI device driver.
 
 The other two solutions are similar. For example, you can view the 
 second solution as one way to rename an interface, or camouflage an 
 interface under a network name. They all require additional works 
 before the VM is created and after the VM is removed.
 
 I also agree with you that we should take a look at XenAPI on this.
 
 
 With regard to your suggestion on how to implement the first solution 
 with some predefined group attribute, I think it definitely can be 
 done. As I have pointed it out earlier, the PCI flavor proposal is 
 actually a generalized version of the PCI group. In other words, in 
 the PCI group proposal, we have one predefined attribute called PCI 
 group, and everything else works on top of that. In the PCI flavor 
 proposal, attribute is arbitrary. So certainly we can define a 
 particular attribute for networking, which let's temporarily

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-17 Thread Robert Li (baoli)
Yunhong,

Thank you for bringing that up on the live migration support. In addition
to the two solutions you mentioned, Irena has a different solution. Let me
put all the them here again:
1. network xml/group based solution.
   In this solution, each host that supports a provider net/physical
net can define a SRIOV group (it's hard to avoid the term as you can see
from the suggestion you made based on the PCI flavor proposal). For each
SRIOV group supported on a compute node, A network XML will be created the
first time the nova compute service is running on that node.
* nova will conduct scheduling, but not PCI device allocation
* it's a simple and clean solution, documented in libvirt as the
way to support live migration with SRIOV. In addition, a network xml is
nicely mapped into a provider net.
2. network xml per PCI device based solution
   This is the solution you brought up in this email, and Ian
mentioned this to me as well. In this solution, a network xml is created
when A VM is created. the network xml needs to be removed once the VM is
removed. This hasn't been tried out as far as I  know.
3. interface xml/interface rename based solution
   Irena brought this up. In this solution, the ethernet interface
name corresponding to the PCI device attached to the VM needs to be
renamed. One way to do so without requiring system reboot is to change the
udev rule's file for interface renaming, followed by a udev reload.

Now, with the first solution, Nova doesn't seem to have control over or
visibility of the PCI device allocated for the VM before the VM is
launched. This needs to be confirmed with the libvirt support and see if
such capability can be provided. This may be a potential drawback if a
neutron plugin requires detailed PCI device information for operation.
Irena may provide more insight into this. Ideally, neutron shouldn't need
this information because the device configuration can be done by libvirt
invoking the PCI device driver.

The other two solutions are similar. For example, you can view the second
solution as one way to rename an interface, or camouflage an interface
under a network name. They all require additional works before the VM is
created and after the VM is removed.

I also agree with you that we should take a look at XenAPI on this.


With regard to your suggestion on how to implement the first solution with
some predefined group attribute, I think it definitely can be done. As I
have pointed it out earlier, the PCI flavor proposal is actually a
generalized version of the PCI group. In other words, in the PCI group
proposal, we have one predefined attribute called PCI group, and
everything else works on top of that. In the PCI flavor proposal,
attribute is arbitrary. So certainly we can define a particular attribute
for networking, which let's temporarily call sriov_group. But I can see
with this idea of predefined attributes, more of them will be required by
different types of devices in the future. I'm sure it will keep us busy
although I'm not sure it's in a good way.

I was expecting you or someone else can provide a practical deployment
scenario that would justify the flexibilities and the complexities.
Although I'd prefer to keep it simple and generalize it later once a
particular requirement is clearly identified, I'm fine to go with it if
that's most of the folks want to do.

--Robert



On 1/16/14 8:36 PM, yunhong jiang yunhong.ji...@linux.intel.com wrote:

On Thu, 2014-01-16 at 01:28 +0100, Ian Wells wrote:
 To clarify a couple of Robert's points, since we had a conversation
 earlier:
 On 15 January 2014 23:47, Robert Li (baoli) ba...@cisco.com wrote:
   ---  do we agree that BDF address (or device id, whatever
 you call it), and node id shouldn't be used as attributes in
 defining a PCI flavor?
 
 
 Note that the current spec doesn't actually exclude it as an option.
 It's just an unwise thing to do.  In theory, you could elect to define
 your flavors using the BDF attribute but determining 'the card in this
 slot is equivalent to all the other cards in the same slot in other
 machines' is probably not the best idea...  We could lock it out as an
 option or we could just assume that administrators wouldn't be daft
 enough to try.
 
 
 * the compute node needs to know the PCI flavor.
 [...] 
   - to support live migration, we need to use
 it to create network xml
 
 
 I didn't understand this at first and it took me a while to get what
 Robert meant here.
 
 This is based on Robert's current code for macvtap based live
 migration.  The issue is that if you wish to migrate a VM and it's
 tied to a physical interface, you can't guarantee that the same
 physical interface is going to be used on the target machine, but at
 the same time you can't change the libvirt.xml as it comes over with
 the migrating machine.  The answer is to define a network 

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-17 Thread Jiang, Yunhong
Robert, thanks for your long reply. Personally I'd prefer option 2/3 as it keep 
Nova the only entity for PCI management.

Glad you are ok with Ian's proposal and we have solution to resolve the libvirt 
network scenario in that framework.

Thanks
--jyh

 -Original Message-
 From: Robert Li (baoli) [mailto:ba...@cisco.com]
 Sent: Friday, January 17, 2014 7:08 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support
 
 Yunhong,
 
 Thank you for bringing that up on the live migration support. In addition
 to the two solutions you mentioned, Irena has a different solution. Let me
 put all the them here again:
 1. network xml/group based solution.
In this solution, each host that supports a provider net/physical
 net can define a SRIOV group (it's hard to avoid the term as you can see
 from the suggestion you made based on the PCI flavor proposal). For each
 SRIOV group supported on a compute node, A network XML will be
 created the
 first time the nova compute service is running on that node.
 * nova will conduct scheduling, but not PCI device allocation
 * it's a simple and clean solution, documented in libvirt as the
 way to support live migration with SRIOV. In addition, a network xml is
 nicely mapped into a provider net.
 2. network xml per PCI device based solution
This is the solution you brought up in this email, and Ian
 mentioned this to me as well. In this solution, a network xml is created
 when A VM is created. the network xml needs to be removed once the
 VM is
 removed. This hasn't been tried out as far as I  know.
 3. interface xml/interface rename based solution
Irena brought this up. In this solution, the ethernet interface
 name corresponding to the PCI device attached to the VM needs to be
 renamed. One way to do so without requiring system reboot is to change
 the
 udev rule's file for interface renaming, followed by a udev reload.
 
 Now, with the first solution, Nova doesn't seem to have control over or
 visibility of the PCI device allocated for the VM before the VM is
 launched. This needs to be confirmed with the libvirt support and see if
 such capability can be provided. This may be a potential drawback if a
 neutron plugin requires detailed PCI device information for operation.
 Irena may provide more insight into this. Ideally, neutron shouldn't need
 this information because the device configuration can be done by libvirt
 invoking the PCI device driver.
 
 The other two solutions are similar. For example, you can view the second
 solution as one way to rename an interface, or camouflage an interface
 under a network name. They all require additional works before the VM is
 created and after the VM is removed.
 
 I also agree with you that we should take a look at XenAPI on this.
 
 
 With regard to your suggestion on how to implement the first solution with
 some predefined group attribute, I think it definitely can be done. As I
 have pointed it out earlier, the PCI flavor proposal is actually a
 generalized version of the PCI group. In other words, in the PCI group
 proposal, we have one predefined attribute called PCI group, and
 everything else works on top of that. In the PCI flavor proposal,
 attribute is arbitrary. So certainly we can define a particular attribute
 for networking, which let's temporarily call sriov_group. But I can see
 with this idea of predefined attributes, more of them will be required by
 different types of devices in the future. I'm sure it will keep us busy
 although I'm not sure it's in a good way.
 
 I was expecting you or someone else can provide a practical deployment
 scenario that would justify the flexibilities and the complexities.
 Although I'd prefer to keep it simple and generalize it later once a
 particular requirement is clearly identified, I'm fine to go with it if
 that's most of the folks want to do.
 
 --Robert
 
 
 
 On 1/16/14 8:36 PM, yunhong jiang yunhong.ji...@linux.intel.com
 wrote:
 
 On Thu, 2014-01-16 at 01:28 +0100, Ian Wells wrote:
  To clarify a couple of Robert's points, since we had a conversation
  earlier:
  On 15 January 2014 23:47, Robert Li (baoli) ba...@cisco.com wrote:
---  do we agree that BDF address (or device id, whatever
  you call it), and node id shouldn't be used as attributes in
  defining a PCI flavor?
 
 
  Note that the current spec doesn't actually exclude it as an option.
  It's just an unwise thing to do.  In theory, you could elect to define
  your flavors using the BDF attribute but determining 'the card in this
  slot is equivalent to all the other cards in the same slot in other
  machines' is probably not the best idea...  We could lock it out as an
  option or we could just assume that administrators wouldn't be daft
  enough to try.
 
 
  * the compute node needs to know the PCI flavor

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-17 Thread Robert Li (baoli)
Yunhong,

I'm hoping that these comments can be directly addressed:
  a practical deployment scenario that requires arbitrary attributes.
  detailed design on the following (that also take into account the
introduction of predefined attributes):
* PCI stats report since the scheduler is stats based
* the scheduler in support of PCI flavors with arbitrary
attributes and potential overlapping.
  networking requirements to support multiple provider nets/physical
nets

I guess that the above will become clear as the discussion goes on. And we
also need to define the deliveries
 
Thanks,
Robert

On 1/17/14 2:02 PM, Jiang, Yunhong yunhong.ji...@intel.com wrote:

Robert, thanks for your long reply. Personally I'd prefer option 2/3 as
it keep Nova the only entity for PCI management.

Glad you are ok with Ian's proposal and we have solution to resolve the
libvirt network scenario in that framework.

Thanks
--jyh

 -Original Message-
 From: Robert Li (baoli) [mailto:ba...@cisco.com]
 Sent: Friday, January 17, 2014 7:08 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support
 
 Yunhong,
 
 Thank you for bringing that up on the live migration support. In
addition
 to the two solutions you mentioned, Irena has a different solution. Let
me
 put all the them here again:
 1. network xml/group based solution.
In this solution, each host that supports a provider net/physical
 net can define a SRIOV group (it's hard to avoid the term as you can see
 from the suggestion you made based on the PCI flavor proposal). For each
 SRIOV group supported on a compute node, A network XML will be
 created the
 first time the nova compute service is running on that node.
 * nova will conduct scheduling, but not PCI device allocation
 * it's a simple and clean solution, documented in libvirt as the
 way to support live migration with SRIOV. In addition, a network xml is
 nicely mapped into a provider net.
 2. network xml per PCI device based solution
This is the solution you brought up in this email, and Ian
 mentioned this to me as well. In this solution, a network xml is created
 when A VM is created. the network xml needs to be removed once the
 VM is
 removed. This hasn't been tried out as far as I  know.
 3. interface xml/interface rename based solution
Irena brought this up. In this solution, the ethernet interface
 name corresponding to the PCI device attached to the VM needs to be
 renamed. One way to do so without requiring system reboot is to change
 the
 udev rule's file for interface renaming, followed by a udev reload.
 
 Now, with the first solution, Nova doesn't seem to have control over or
 visibility of the PCI device allocated for the VM before the VM is
 launched. This needs to be confirmed with the libvirt support and see if
 such capability can be provided. This may be a potential drawback if a
 neutron plugin requires detailed PCI device information for operation.
 Irena may provide more insight into this. Ideally, neutron shouldn't
need
 this information because the device configuration can be done by libvirt
 invoking the PCI device driver.
 
 The other two solutions are similar. For example, you can view the
second
 solution as one way to rename an interface, or camouflage an interface
 under a network name. They all require additional works before the VM is
 created and after the VM is removed.
 
 I also agree with you that we should take a look at XenAPI on this.
 
 
 With regard to your suggestion on how to implement the first solution
with
 some predefined group attribute, I think it definitely can be done. As I
 have pointed it out earlier, the PCI flavor proposal is actually a
 generalized version of the PCI group. In other words, in the PCI group
 proposal, we have one predefined attribute called PCI group, and
 everything else works on top of that. In the PCI flavor proposal,
 attribute is arbitrary. So certainly we can define a particular
attribute
 for networking, which let's temporarily call sriov_group. But I can see
 with this idea of predefined attributes, more of them will be required
by
 different types of devices in the future. I'm sure it will keep us busy
 although I'm not sure it's in a good way.
 
 I was expecting you or someone else can provide a practical deployment
 scenario that would justify the flexibilities and the complexities.
 Although I'd prefer to keep it simple and generalize it later once a
 particular requirement is clearly identified, I'm fine to go with it if
 that's most of the folks want to do.
 
 --Robert
 
 
 
 On 1/16/14 8:36 PM, yunhong jiang yunhong.ji...@linux.intel.com
 wrote:
 
 On Thu, 2014-01-16 at 01:28 +0100, Ian Wells wrote:
  To clarify a couple of Robert's points, since we had a conversation
  earlier:
  On 15 January 2014 23:47, Robert Li (baoli) ba...@cisco.com wrote

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-17 Thread yunhong jiang
On Fri, 2014-01-17 at 22:30 +, Robert Li (baoli) wrote:
 Yunhong,
 
 I'm hoping that these comments can be directly addressed:
   a practical deployment scenario that requires arbitrary
 attributes.

I'm just strongly against to support only one attributes (your PCI
group) for scheduling and management, that's really TOO limited.

A simple scenario is, I have 3 encryption card:
Card 1 (vendor_id is V1, device_id =0xa)
card 2(vendor_id is V1, device_id=0xb)
card 3(vendor_id is V2, device_id=0xb)

I have two images. One image only support Card 1 and another image
support Card 1/3 (or any other combination of the 3 card type). I don't
only one attributes will meet such requirement.

As to arbitrary attributes or limited list of attributes, my opinion is,
as there are so many type of PCI devices and so many potential of PCI
devices usage, support arbitrary attributes will make our effort more
flexible, if we can push the implementation into the tree.

   detailed design on the following (that also take into account
 the
 introduction of predefined attributes):
 * PCI stats report since the scheduler is stats based

I don't think there are much difference with current implementation.

 * the scheduler in support of PCI flavors with arbitrary
 attributes and potential overlapping.

As Ian said, we need make sure the pci_stats and the PCI flavor have the
same set of attributes, so I don't think there are much difference with
current implementation.

   networking requirements to support multiple provider
 nets/physical
 nets

Can't the extra info resolve this issue? Can you elaborate the issue?

Thanks
--jyh
 
 I guess that the above will become clear as the discussion goes on.
 And we
 also need to define the deliveries
  
 Thanks,
 Robert 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread yongli he

On 2014?01?16? 08:28, Ian Wells wrote:
To clarify a couple of Robert's points, since we had a conversation 
earlier:
On 15 January 2014 23:47, Robert Li (baoli) ba...@cisco.com 
mailto:ba...@cisco.com wrote:


---  do we agree that BDF address (or device id, whatever you call
it), and node id shouldn't be used as attributes in defining a PCI
flavor?


Note that the current spec doesn't actually exclude it as an option.  
It's just an unwise thing to do.  In theory, you could elect to define 
your flavors using the BDF attribute but determining 'the card in this 
slot is equivalent to all the other cards in the same slot in other 
machines' is probably not the best idea...  We could lock it out as an 
option or we could just assume that administrators wouldn't be daft 
enough to try.


  * the compute node needs to know the PCI flavor. [...]
  - to support live migration, we need to use it
to create network xml


I didn't understand this at first and it took me a while to get what 
Robert meant here.


This is based on Robert's current code for macvtap based live 
migration.  The issue is that if you wish to migrate a VM and it's 
tied to a physical interface, you can't guarantee that the same 
physical interface is going to be used on the target machine, but at 
the same time you can't change the libvirt.xml as it comes over with 
the migrating machine.  The answer is to define a network and refer 
out to it from libvirt.xml.  In Robert's current code he's using the 
group name of the PCI devices to create a network containing the list 
of equivalent devices (those in the group) that can be macvtapped.  
Thus when the host migrates it will find another, equivalent, 
interface. This falls over in the use case under
but, with flavor we defined, the group could be a tag for this purpose, 
and all Robert's design still work, so it ok, right?
consideration where a device can be mapped using more than one flavor, 
so we have to discard the use case or rethink the implementation.


There's a more complex solution - I think - where we create a 
temporary network for each macvtap interface a machine's going to use, 
with a name based on the instance UUID and port number, and containing 
the device to map. Before starting the migration we would create a 
replacement network containing only the new device on the target host; 
migration would find the network from the name in the libvirt.xml, and 
the content of that network would behave identically.  We'd be 
creating libvirt networks on the fly and a lot more of them, and we'd 
need decent cleanup code too ('when freeing a PCI device, delete any 
network it's a member of'), so it all becomes a lot more hairy.

--
Ian.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread Irena Berezovsky
Ian,
Thank you for putting in writing the ongoing discussed specification.
I have added few comments on the Google doc [1].

As for live migration support, this can be done also without libvirt network 
usage.
Not very elegant, but working:  rename the interface of the PCI device to some 
logical name, let's say based on neutron port UUID and put it into the 
interface XML, i.e.:
If PCI device network interface name  is eth8 and neutron port UUID is   
02bc4aec-b4f4-436f-b651-024 then rename it to something like: eth02bc4aec-b4'. 
The interface XML will look like this:

  ...
interface type='direct'
mac address='fa:16:3e:46:d3:e8'/
source dev='eth02bc4aec-b4' mode='passthrough'/
target dev='macvtap0'/
model type='virtio'/
alias name='net0'/
address type='pci' domain='0x' bus='0x00' slot='0x03' function='0x0'/
/interface


...

[1] 
https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#heading=h.308b0wqn1zde

BR,
Irena
From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 16, 2014 2:34 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

To clarify a couple of Robert's points, since we had a conversation earlier:
On 15 January 2014 23:47, Robert Li (baoli) 
ba...@cisco.commailto:ba...@cisco.com wrote:
  ---  do we agree that BDF address (or device id, whatever you call it), and 
node id shouldn't be used as attributes in defining a PCI flavor?

Note that the current spec doesn't actually exclude it as an option.  It's just 
an unwise thing to do.  In theory, you could elect to define your flavors using 
the BDF attribute but determining 'the card in this slot is equivalent to all 
the other cards in the same slot in other machines' is probably not the best 
idea...  We could lock it out as an option or we could just assume that 
administrators wouldn't be daft enough to try.
* the compute node needs to know the PCI flavor. [...]
  - to support live migration, we need to use it to create 
network xml

I didn't understand this at first and it took me a while to get what Robert 
meant here.

This is based on Robert's current code for macvtap based live migration.  The 
issue is that if you wish to migrate a VM and it's tied to a physical 
interface, you can't guarantee that the same physical interface is going to be 
used on the target machine, but at the same time you can't change the 
libvirt.xml as it comes over with the migrating machine.  The answer is to 
define a network and refer out to it from libvirt.xml.  In Robert's current 
code he's using the group name of the PCI devices to create a network 
containing the list of equivalent devices (those in the group) that can be 
macvtapped.  Thus when the host migrates it will find another, equivalent, 
interface.  This falls over in the use case under consideration where a device 
can be mapped using more than one flavor, so we have to discard the use case or 
rethink the implementation.

There's a more complex solution - I think - where we create a temporary network 
for each macvtap interface a machine's going to use, with a name based on the 
instance UUID and port number, and containing the device to map.  Before 
starting the migration we would create a replacement network containing only 
the new device on the target host; migration would find the network from the 
name in the libvirt.xml, and the content of that network would behave 
identically.  We'd be creating libvirt networks on the fly and a lot more of 
them, and we'd need decent cleanup code too ('when freeing a PCI device, delete 
any network it's a member of'), so it all becomes a lot more hairy.
--
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread Ian Wells
On 16 January 2014 09:07, yongli he yongli...@intel.com wrote:

  On 2014年01月16日 08:28, Ian Wells wrote:

 This is based on Robert's current code for macvtap based live migration.
 The issue is that if you wish to migrate a VM and it's tied to a physical
 interface, you can't guarantee that the same physical interface is going to
 be used on the target machine, but at the same time you can't change the
 libvirt.xml as it comes over with the migrating machine.  The answer is to
 define a network and refer out to it from libvirt.xml.  In Robert's current
 code he's using the group name of the PCI devices to create a network
 containing the list of equivalent devices (those in the group) that can be
 macvtapped.  Thus when the host migrates it will find another, equivalent,
 interface.  This falls over in the use case under

 but, with flavor we defined, the group could be a tag for this purpose,
 and all Robert's design still work, so it ok, right?


Well, you could make a label up consisting of the values of the attributes
in the group, but since a flavor can encompass multiple groups (for
instance, I group by device and vendor and then I use two device types in
my flavor) this still doesn't work.  Irena's solution does, though.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread Sandhya Dasu (sadasu)
Hi Irena,
   Thanks for pointing out an alternative to the network xml solution to live 
migration. I am still not clear about the solution.

Some questions:

  1.  Where does the rename of the PCI device network interface name occur?
  2.  Can this rename be done for a VF? I think your example shows rename of a 
PF.

Thanks,
Sandhya

From: Irena Berezovsky ire...@mellanox.commailto:ire...@mellanox.com
Reply-To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Date: Thursday, January 16, 2014 4:43 AM
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Ian,
Thank you for putting in writing the ongoing discussed specification.
I have added few comments on the Google doc [1].

As for live migration support, this can be done also without libvirt network 
usage.
Not very elegant, but working:  rename the interface of the PCI device to some 
logical name, let’s say based on neutron port UUID and put it into the 
interface XML, i.e.:
If PCI device network interface name  is eth8 and neutron port UUID is   
02bc4aec-b4f4-436f-b651-024 then rename it to something like: eth02bc4aec-b4'. 
The interface XML will look like this:

  ...
interface type='direct'
mac address='fa:16:3e:46:d3:e8'/
source dev='eth02bc4aec-b4' mode='passthrough'/
target dev='macvtap0'/
model type='virtio'/
alias name='net0'/
address type='pci' domain='0x' bus='0x00' slot='0x03' function='0x0'/
/interface


...

[1] 
https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#heading=h.308b0wqn1zde

BR,
Irena
From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 16, 2014 2:34 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

To clarify a couple of Robert's points, since we had a conversation earlier:
On 15 January 2014 23:47, Robert Li (baoli) 
ba...@cisco.commailto:ba...@cisco.com wrote:
  ---  do we agree that BDF address (or device id, whatever you call it), and 
node id shouldn't be used as attributes in defining a PCI flavor?

Note that the current spec doesn't actually exclude it as an option.  It's just 
an unwise thing to do.  In theory, you could elect to define your flavors using 
the BDF attribute but determining 'the card in this slot is equivalent to all 
the other cards in the same slot in other machines' is probably not the best 
idea...  We could lock it out as an option or we could just assume that 
administrators wouldn't be daft enough to try.
* the compute node needs to know the PCI flavor. [...]
  - to support live migration, we need to use it to create 
network xml

I didn't understand this at first and it took me a while to get what Robert 
meant here.

This is based on Robert's current code for macvtap based live migration.  The 
issue is that if you wish to migrate a VM and it's tied to a physical 
interface, you can't guarantee that the same physical interface is going to be 
used on the target machine, but at the same time you can't change the 
libvirt.xml as it comes over with the migrating machine.  The answer is to 
define a network and refer out to it from libvirt.xml.  In Robert's current 
code he's using the group name of the PCI devices to create a network 
containing the list of equivalent devices (those in the group) that can be 
macvtapped.  Thus when the host migrates it will find another, equivalent, 
interface.  This falls over in the use case under consideration where a device 
can be mapped using more than one flavor, so we have to discard the use case or 
rethink the implementation.

There's a more complex solution - I think - where we create a temporary network 
for each macvtap interface a machine's going to use, with a name based on the 
instance UUID and port number, and containing the device to map.  Before 
starting the migration we would create a replacement network containing only 
the new device on the target host; migration would find the network from the 
name in the libvirt.xml, and the content of that network would behave 
identically.  We'd be creating libvirt networks on the fly and a lot more of 
them, and we'd need decent cleanup code too ('when freeing a PCI device, delete 
any network it's a member of'), so it all becomes a lot more hairy.
--
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread yunhong jiang
On Thu, 2014-01-16 at 01:28 +0100, Ian Wells wrote:
 To clarify a couple of Robert's points, since we had a conversation
 earlier:
 On 15 January 2014 23:47, Robert Li (baoli) ba...@cisco.com wrote:
   ---  do we agree that BDF address (or device id, whatever
 you call it), and node id shouldn't be used as attributes in
 defining a PCI flavor?
 
 
 Note that the current spec doesn't actually exclude it as an option.
 It's just an unwise thing to do.  In theory, you could elect to define
 your flavors using the BDF attribute but determining 'the card in this
 slot is equivalent to all the other cards in the same slot in other
 machines' is probably not the best idea...  We could lock it out as an
 option or we could just assume that administrators wouldn't be daft
 enough to try.
 
 
 * the compute node needs to know the PCI flavor.
 [...] 
   - to support live migration, we need to use
 it to create network xml
 
 
 I didn't understand this at first and it took me a while to get what
 Robert meant here.
 
 This is based on Robert's current code for macvtap based live
 migration.  The issue is that if you wish to migrate a VM and it's
 tied to a physical interface, you can't guarantee that the same
 physical interface is going to be used on the target machine, but at
 the same time you can't change the libvirt.xml as it comes over with
 the migrating machine.  The answer is to define a network and refer
 out to it from libvirt.xml.  In Robert's current code he's using the
 group name of the PCI devices to create a network containing the list
 of equivalent devices (those in the group) that can be macvtapped.
 Thus when the host migrates it will find another, equivalent,
 interface.  This falls over in the use case under consideration where
 a device can be mapped using more than one flavor, so we have to
 discard the use case or rethink the implementation.
 
 There's a more complex solution - I think - where we create a
 temporary network for each macvtap interface a machine's going to use,
 with a name based on the instance UUID and port number, and containing
 the device to map.  Before starting the migration we would create a
 replacement network containing only the new device on the target host;
 migration would find the network from the name in the libvirt.xml, and
 the content of that network would behave identically.  We'd be
 creating libvirt networks on the fly and a lot more of them, and we'd
 need decent cleanup code too ('when freeing a PCI device, delete any
 network it's a member of'), so it all becomes a lot more hairy.
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Ian/Robert, below is my understanding to the method Robet want to use,
am I right?

a) Define a libvirt network as  Using a macvtap direct connection
section at http://libvirt.org/formatnetwork.html . For example, like
followed one: 
network
name group_name1 /name
forward mode=bridge
  interface dev=eth20/
  interface dev=eth21/
  interface dev=eth22/
  interface dev=eth23/
  interface dev=eth24/
/forward
  /network


b) When assign SRIOV NIC devices to an instance, as in Assignment from
a pool of SRIOV VFs in a libvirt network definition section in
http://wiki.libvirt.org/page/Networking#PCI_Passthrough_of_host_network_devices 
, use libvirt network definition group_name1. For example, like followed one:

  interface type='network'
source network='group_name1'
  /interface


If my understanding is correct, then I have something unclear yet:
a) How will the libvirt create the libvirt network (i.e. libvirt network
group_name1)? Will it has be created when compute boot up, or it will be
created before instance creation? I suppose per Robert's design, it's
created when compute node is up, am I right?

b) If all the interface are used up by instance, what will happen.
Considering that 4 interface allocated to the group_name1 libvirt
network, and user try to migrate 6 instance with 'group_name1' network,
what will happen?

And below is my comments:

a) Yes, this is in fact different with the current nova PCI support
philosophy. Currently we assume Nova owns the devices, manage the device
assignment to each instance. While in such situation, libvirt network is
in fact another layer of PCI device management layer (although very
thin) !

b) This also remind me that possibly other VMM like XenAPI has special
requirement and we need input/confirmation from them also.


As how to resolve the issue, I think there are several solution:

a) Create one libvirt network for each SRIOV NIC assigned to each
instance dynamic, i.e. the libvirt network always has only one interface
included, it may be static created or dynamical created. This solution
in fact removes the 

[openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-15 Thread Robert Li (baoli)
Hi Folks,

In light of today's IRC meeting, and for the purpose of moving this forward, 
I'm fine to go with the following if that's what everyone wants to go with:

 
https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit

But with some concerns and reservations.

  ---  I don't expect everyone to agree on this. But I think the proposal is 
much more complicated in terms of implementation and administration.
  ---  I'd like to see a practical deployment scenario in which only PCI flavor 
can support, but PCI group can't, which I guess can justify the complexities.
  ---  do we agree that BDF address (or device id, whatever you call it), and 
node id shouldn't be used as attributes in defining a PCI flavor?
  ---  I'd like to see a detailed (not vague) design on the following:
* PCI stats report since the scheduler is stats based
* the scheduler in support of PCI flavors with arbitrary attributes.
  ---  I'd like to see how this can be mapped into SRIOV support:
* the compute node needs to know the PCI flavor. A couple of reasons 
for this:
  - the neutron plugin may need this to associate with a 
particular subsystem (or physical network)
  - to support live migration, we need to use it to create 
network xml
* We also need to be able to do auto discovery so that we can support 
live migration with SRIOV
* use the PCI flavor in the —nic option and neutron commands
  --- Just want to point out that this PCI flavor doesn't seem to be the same 
PCI flavor that Join was talking about in one of his emails.

I'd like to also point out that if you consider a PCI group as an attribute (in 
terms of the proposal), then the PCI group design is a special (or degenerated) 
case of the proposed design. The significant difference here is that with PCI 
group, its semantics is clear and well defined, and everything else works on 
top of it. An attribute is arbitrary and open for interpretation. In terms of 
getting things done ASAP, the PCI group is actually the way to go.

I guess that we will take a phased approach to implement it so that we can get 
something done in Icehouse. However, I'd like to see that neutron requirements 
one way or the other can be satisfied in the first phase.

Maybe we can continue the IRC tomorrow and talk about the above. Again, let's 
move on if that's really where we want to go.

thanks,
Robert


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-15 Thread Ian Wells
To clarify a couple of Robert's points, since we had a conversation earlier:
On 15 January 2014 23:47, Robert Li (baoli) ba...@cisco.com wrote:

   ---  do we agree that BDF address (or device id, whatever you call it),
 and node id shouldn't be used as attributes in defining a PCI flavor?


Note that the current spec doesn't actually exclude it as an option.  It's
just an unwise thing to do.  In theory, you could elect to define your
flavors using the BDF attribute but determining 'the card in this slot is
equivalent to all the other cards in the same slot in other machines' is
probably not the best idea...  We could lock it out as an option or we
could just assume that administrators wouldn't be daft enough to try.

* the compute node needs to know the PCI flavor. [...]
   - to support live migration, we need to use it to create
 network xml


I didn't understand this at first and it took me a while to get what Robert
meant here.

This is based on Robert's current code for macvtap based live migration.
The issue is that if you wish to migrate a VM and it's tied to a physical
interface, you can't guarantee that the same physical interface is going to
be used on the target machine, but at the same time you can't change the
libvirt.xml as it comes over with the migrating machine.  The answer is to
define a network and refer out to it from libvirt.xml.  In Robert's current
code he's using the group name of the PCI devices to create a network
containing the list of equivalent devices (those in the group) that can be
macvtapped.  Thus when the host migrates it will find another, equivalent,
interface.  This falls over in the use case under consideration where a
device can be mapped using more than one flavor, so we have to discard the
use case or rethink the implementation.

There's a more complex solution - I think - where we create a temporary
network for each macvtap interface a machine's going to use, with a name
based on the instance UUID and port number, and containing the device to
map.  Before starting the migration we would create a replacement network
containing only the new device on the target host; migration would find the
network from the name in the libvirt.xml, and the content of that network
would behave identically.  We'd be creating libvirt networks on the fly and
a lot more of them, and we'd need decent cleanup code too ('when freeing a
PCI device, delete any network it's a member of'), so it all becomes a lot
more hairy.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Irena Berezovsky
Hi,
After having a lot of discussions both on IRC and mailing list, I would like to 
suggest to define basic use cases for PCI pass-through network support with 
agreed list of limitations and assumptions  and implement it.  By doing this 
Proof of Concept we will be able to deliver basic PCI pass-through network 
support in Icehouse timeframe and understand better how to provide complete 
solution starting from  tenant /admin API enhancement, enhancing nova-neutron 
communication and eventually provide neutron plugin  supporting the PCI 
pass-through networking.
We can try to split tasks between currently involved participants and bring up 
the basic case. Then we can enhance the implementation.
Having more knowledge and experience with neutron parts, I would like  to start 
working on neutron mechanism driver support.  I have already started to arrange 
the following blueprint doc based on everyone's ideas:
https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit

For the basic PCI pass-through networking case we can assume the following:

1.   Single provider network (PN1)

2.   White list of available SRIOV PCI devices for allocation as NIC for 
neutron networks on provider network  (PN1) is defined on each compute node

3.   Support directly assigned SRIOV PCI pass-through device as vNIC. (This 
will limit the number of tests)

4.   More 


If my suggestion seems reasonable to you, let's try to reach an agreement and 
split the work during our Monday IRC meeting.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Saturday, January 11, 2014 8:36 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Comments with prefix [yjiang5_2] , including the double confirm.

I think we (you and me) is mostly on the same page, would you please give a 
summary, and then we can have community , including Irena/Robert, to check it. 
We need Cores to sponsor it. We should check with John to see if this is 
different with his mentor picture, and we may need a neutron core (I assume 
Cisco has a bunch of Neutron cores :) )to sponsor it?

And, will anyone from Cisco can help on the implementation? After this long 
discussion, we are in half bottom of I release and I'm not sure if Yongli and I 
alone can finish them in I release.

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 6:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support



 OK - so if this is good then I think the question is how we could change the 
 'pci_whitelist' parameter we have - which, as you say, should either *only* 
 do whitelisting or be renamed - to allow us to add information.  Yongli has 
 something along those lines but it's not flexible and it distinguishes poorly 
 between which bits are extra information and which bits are matching 
 expressions (and it's still called pci_whitelist) - but even with those 
 criticisms it's very close to what we're talking about.  When we have that I 
 think a lot of the rest of the arguments should simply resolve themselves.



 [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
 change to pci_whitelist is because it combined two things. So a stupid/naive 
 solution in my head is, change it to VERY generic name, 
 'pci_devices_information',

 and change schema as an array of {'devices_property'=regex exp, 'group_name' 
 = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
 vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
 more into the pci_devices_information in future, like 'network_information' 
 = xxx or Neutron specific information you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = Acme inc. discombobulator }, info = { group = we like 
teh groups, volume = 11 } }

[yjiang5_2] Double confirm that 'match' is whitelist, and info is 'extra info', 
right?  Can the key be more meaningful, for example, 
s/match/pci_device_property,  s/info/pci_device_info, or s/match/pci_devices/  
etc.
Also assume the class should be the class code in the configuration space, and 
be digital, am I right? Otherwise, it's not easy to get the 'Acme inc. 
discombobulator' information.



 All keys other than 'device_property' becomes extra information, i.e. 
 software defined property. These extra information will be carried with the 
 PCI devices,. Some implementation details, A)we can limit the acceptable 
 keys, like we only support 'group_name', 'network_id', or we can accept any 
 keys other than reserved (vendor_id

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Ian Wells
Irena, have a word with Bob (rkukura on IRC, East coast), he was talking
about what would be needed already and should be able to help you.
Conveniently he's also core. ;)
-- 
Ian.


On 12 January 2014 22:12, Irena Berezovsky ire...@mellanox.com wrote:

 Hi John,
 Thank you for taking an initiative and summing up the work that need to be
 done to provide PCI pass-through network support.
 The only item I think is missing is the neutron support for PCI
 pass-through. Currently we have Mellanox Plugin that supports PCI
 pass-through assuming Mellanox Adapter card embedded switch technology. But
 in order to have fully integrated  PCI pass-through networking support for
 the use cases Robert listed on previous mail, the generic neutron PCI
 pass-through support is required. This can be enhanced with vendor specific
 task that may differ (Mellanox Embedded switch vs Cisco 802.1BR), but there
 is still common part of being PCI aware mechanism driver.
 I have already started with definition for this part:

 https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#
 I also plan to start coding soon.

 Depends on how it goes, I can take also nova parts that integrate with
 neutron APIs from item 3.

 Regards,
 Irena

 -Original Message-
 From: John Garbutt [mailto:j...@johngarbutt.com]
 Sent: Friday, January 10, 2014 4:34 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support

 Apologies for this top post, I just want to move this discussion towards
 action.

 I am traveling next week so it is unlikely that I can make the meetings.
 Sorry.

 Can we please agree on some concrete actions, and who will do the coding?
 This also means raising new blueprints for each item of work.
 I am happy to review and eventually approve those blueprints, if you email
 me directly.

 Ideas are taken from what we started to agree on, mostly written up here:
 https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions


 What doesn't need doing...
 

 We have PCI whitelist and PCI alias at the moment, let keep those names
 the same for now.
 I personally prefer PCI-flavor, rather than PCI-alias, but lets discuss
 any rename separately.

 We seemed happy with the current system (roughly) around GPU passthrough:
 nova flavor-key three_GPU_attached_30GB set pci_passthrough:alias=
 large_GPU:1,small_GPU:2
 nova boot --image some_image --flavor three_GPU_attached_30GB some_name

 Again, we seemed happy with the current PCI whitelist.

 Sure, we could optimise the scheduling, but again, please keep that a
 separate discussion.
 Something in the scheduler needs to know how many of each PCI alias are
 available on each host.
 How that information gets there can be change at a later date.

 PCI alias is in config, but its probably better defined using host
 aggregates, or some custom API.
 But lets leave that for now, and discuss it separately.
 If the need arrises, we can migrate away from the config.


 What does need doing...
 ==

 1) API  CLI changes for nic-type, and associated tempest tests

 * Add a user visible nic-type so users can express on of several network
 types.
 * We need a default nic-type, for when the user doesn't specify one (might
 default to SRIOV in some cases)
 * We can easily test the case where the default is virtual and the user
 expresses a preference for virtual
 * Above is much better than not testing it at all.

 nova boot --flavor m1.large --image image_id
   --nic net-id=net-id-1
   --nic net-id=net-id-2,nic-type=fast
   --nic net-id=net-id-3,nic-type=fast vm-name

 or

 neutron port-create
   --fixed-ip subnet_id=subnet-id,ip_address=192.168.57.101
   --nic-type=slow | fast | foobar
   net-id
 nova boot --flavor m1.large --image image_id --nic port-id=port-id

 Where nic-type is just an extra bit metadata string that is passed to nova
 and the VIF driver.


 2) Expand PCI alias information

 We need extensions to PCI alias so we can group SRIOV devices better.

 I still think we are yet to agree on a format, but I would suggest this as
 a starting point:

 {
  name:GPU_fast,
  devices:[
   {vendor_id:1137,product_id:0071, address:*,
 attach-type:direct},
   {vendor_id:1137,product_id:0072, address:*,
 attach-type:direct}  ],
  sriov_info: {}
 }

 {
  name:NIC_fast,
  devices:[
   {vendor_id:1137,product_id:0071, address:0:[1-50]:2:*,
 attach-type:macvtap}
   {vendor_id:1234,product_id:0081, address:*,
 attach-type:direct}  ],
  sriov_info: {
   nic_type:fast,
   network_ids: [net-id-1, net-id-2]  } }

 {
  name:NIC_slower,
  devices:[
   {vendor_id:1137,product_id:0071, address:*,
 attach-type:direct}
   {vendor_id:1234,product_id:0081, address:*,
 attach-type:direct}  ],
  sriov_info: {
   nic_type:fast,
   network_ids: [*]  # this means could attach to any network  } }

 The idea being the VIF driver gets passed this info, when

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Irena Berezovsky
Ian,
It's great news.
Thank you for bringing Bob's attention to this effort. I'll look for Bob on IRC 
to get the details.
And of course, core support raises our chances to make PCI pass-through 
networking into icehouse.

BR,
Irena

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Monday, January 13, 2014 2:02 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Irena, have a word with Bob (rkukura on IRC, East coast), he was talking about 
what would be needed already and should be able to help you.  Conveniently he's 
also core. ;)
--
Ian.

On 12 January 2014 22:12, Irena Berezovsky 
ire...@mellanox.commailto:ire...@mellanox.com wrote:
Hi John,
Thank you for taking an initiative and summing up the work that need to be done 
to provide PCI pass-through network support.
The only item I think is missing is the neutron support for PCI pass-through. 
Currently we have Mellanox Plugin that supports PCI pass-through assuming 
Mellanox Adapter card embedded switch technology. But in order to have fully 
integrated  PCI pass-through networking support for the use cases Robert listed 
on previous mail, the generic neutron PCI pass-through support is required. 
This can be enhanced with vendor specific task that may differ (Mellanox 
Embedded switch vs Cisco 802.1BR), but there is still common part of being PCI 
aware mechanism driver.
I have already started with definition for this part:
https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit
I also plan to start coding soon.

Depends on how it goes, I can take also nova parts that integrate with neutron 
APIs from item 3.

Regards,
Irena

-Original Message-
From: John Garbutt [mailto:j...@johngarbutt.commailto:j...@johngarbutt.com]
Sent: Friday, January 10, 2014 4:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Apologies for this top post, I just want to move this discussion towards action.

I am traveling next week so it is unlikely that I can make the meetings. Sorry.

Can we please agree on some concrete actions, and who will do the coding?
This also means raising new blueprints for each item of work.
I am happy to review and eventually approve those blueprints, if you email me 
directly.

Ideas are taken from what we started to agree on, mostly written up here:
https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions


What doesn't need doing...


We have PCI whitelist and PCI alias at the moment, let keep those names the 
same for now.
I personally prefer PCI-flavor, rather than PCI-alias, but lets discuss any 
rename separately.

We seemed happy with the current system (roughly) around GPU passthrough:
nova flavor-key three_GPU_attached_30GB set pci_passthrough:alias= 
large_GPU:1,small_GPU:2
nova boot --image some_image --flavor three_GPU_attached_30GB some_name

Again, we seemed happy with the current PCI whitelist.

Sure, we could optimise the scheduling, but again, please keep that a separate 
discussion.
Something in the scheduler needs to know how many of each PCI alias are 
available on each host.
How that information gets there can be change at a later date.

PCI alias is in config, but its probably better defined using host aggregates, 
or some custom API.
But lets leave that for now, and discuss it separately.
If the need arrises, we can migrate away from the config.


What does need doing...
==

1) API  CLI changes for nic-type, and associated tempest tests

* Add a user visible nic-type so users can express on of several network 
types.
* We need a default nic-type, for when the user doesn't specify one (might 
default to SRIOV in some cases)
* We can easily test the case where the default is virtual and the user 
expresses a preference for virtual
* Above is much better than not testing it at all.

nova boot --flavor m1.large --image image_id
  --nic net-id=net-id-1
  --nic net-id=net-id-2,nic-type=fast
  --nic net-id=net-id-3,nic-type=fast vm-name

or

neutron port-create
  --fixed-ip subnet_id=subnet-id,ip_address=192.168.57.101
  --nic-type=slow | fast | foobar
  net-id
nova boot --flavor m1.large --image image_id --nic port-id=port-id

Where nic-type is just an extra bit metadata string that is passed to nova and 
the VIF driver.


2) Expand PCI alias information

We need extensions to PCI alias so we can group SRIOV devices better.

I still think we are yet to agree on a format, but I would suggest this as a 
starting point:

{
 name:GPU_fast,
 devices:[
  {vendor_id:1137,product_id:0071, address:*, attach-type:direct},
  {vendor_id:1137,product_id:0072, address:*, attach-type:direct} 
 ],
 sriov_info: {}
}

{
 name:NIC_fast,
 devices:[
  {vendor_id:1137,product_id:0071

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Robert Li (baoli)
As I have responded in the other email, and If I understand PCI flavor 
correctly, then the issue that we need to deal with is the overlapping issue. A 
simplest case of this overlapping is that you can define a flavor F1 as 
[vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v'] .  Let's 
assume that only the admin can define the flavors. It's not hard to see that a 
device can belong to the two different flavors in the same time. This 
introduces an issue in the scheduler. Suppose the scheduler (counts or stats 
based) maintains counts based on flavors (or the keys corresponding to the 
flavors). To request a device with the flavor F1,  counts in F2 needs to be 
subtracted by one as well. There may be several ways to achieve that. But 
regardless, it introduces tremendous overhead in terms of system processing and 
administrative costs.

What are the use cases for that? How practical are those use cases?

thanks,
Robert

On 1/10/14 9:34 PM, Ian Wells 
ijw.ubu...@cack.org.ukmailto:ijw.ubu...@cack.org.uk wrote:



 OK - so if this is good then I think the question is how we could change the 
 'pci_whitelist' parameter we have - which, as you say, should either *only* 
 do whitelisting or be renamed - to allow us to add information.  Yongli has 
 something along those lines but it's not flexible and it distinguishes poorly 
 between which bits are extra information and which bits are matching 
 expressions (and it's still called pci_whitelist) - but even with those 
 criticisms it's very close to what we're talking about.  When we have that I 
 think a lot of the rest of the arguments should simply resolve themselves.



 [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
 change to pci_whitelist is because it combined two things. So a stupid/naive 
 solution in my head is, change it to VERY generic name, 
 ‘pci_devices_information’,

 and change schema as an array of {‘devices_property’=regex exp, ‘group_name’ 
 = ‘g1’} dictionary, and the device_property expression can be ‘address ==xxx, 
 vendor_id == xxx’ (i.e. similar with current white list),  and we can squeeze 
 more into the “pci_devices_information” in future, like ‘network_information’ 
 = xxx or “Neutron specific information” you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = Acme inc. discombobulator }, info = { group = we like 
teh groups, volume = 11 } }


 All keys other than ‘device_property’ becomes extra information, i.e. 
 software defined property. These extra information will be carried with the 
 PCI devices,. Some implementation details, A)we can limit the acceptable 
 keys, like we only support ‘group_name’, ‘network_id’, or we can accept any 
 keys other than reserved (vendor_id, device_id etc) one.


Not sure we have a good list of reserved keys at the moment, and with two dicts 
it isn't really necessary, I guess.  I would say that we have one match parser 
which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression:
if k.starts_with('e.'):
   mv = extra_specs.get(k[2:])
else:
   mv = pci_details.get(k[2:])
if not match(m, mv):
return False
return True

Usable in this matching (where 'e.' just won't work) and also for flavor 
assignment (where e. will indeed match the extra values).

 B) if a device match ‘device_property’ in several entries, raise exception, 
 or use the first one.

Use the first one, I think.  It's easier, and potentially more useful.

 [yjiang5_1] Another thing need discussed is, as you pointed out, “we would 
 need to add a config param on the control host to decide which flags to group 
 on when doing the stats”.  I agree with the design, but some details need 
 decided.

This is a patch that can come at any point after we do the above stuff (which 
we need for Neutron), clearly.

 Where should it defined. If we a) define it in both control node and compute 
 node, then it should be static defined (just change pool_keys in 
 /opt/stack/nova/nova/pci/pci_stats.py to a configuration parameter) . Or b) 
 define only in control node, then I assume the control node should be the 
 scheduler node, and the scheduler manager need save such information, present 
 a API to fetch such information and the compute node need fetch it on every 
 update_available_resource() periodic task. I’d prefer to take a) option in 
 first step. Your idea?

I think it has to be (a), which is a shame.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Jiang, Yunhong
Hi, Robert, scheduler keep count based on pci_stats instead of the pci flavor.

As stated by Ian at 
https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.html 
already, the flavor will only use the tags used by pci_stats.

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, January 13, 2014 8:22 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

As I have responded in the other email, and If I understand PCI flavor 
correctly, then the issue that we need to deal with is the overlapping issue. A 
simplest case of this overlapping is that you can define a flavor F1 as 
[vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v'] .  Let's 
assume that only the admin can define the flavors. It's not hard to see that a 
device can belong to the two different flavors in the same time. This 
introduces an issue in the scheduler. Suppose the scheduler (counts or stats 
based) maintains counts based on flavors (or the keys corresponding to the 
flavors). To request a device with the flavor F1,  counts in F2 needs to be 
subtracted by one as well. There may be several ways to achieve that. But 
regardless, it introduces tremendous overhead in terms of system processing and 
administrative costs.

What are the use cases for that? How practical are those use cases?

thanks,
Robert

On 1/10/14 9:34 PM, Ian Wells 
ijw.ubu...@cack.org.ukmailto:ijw.ubu...@cack.org.uk wrote:



 OK - so if this is good then I think the question is how we could change the 
 'pci_whitelist' parameter we have - which, as you say, should either *only* 
 do whitelisting or be renamed - to allow us to add information.  Yongli has 
 something along those lines but it's not flexible and it distinguishes poorly 
 between which bits are extra information and which bits are matching 
 expressions (and it's still called pci_whitelist) - but even with those 
 criticisms it's very close to what we're talking about.  When we have that I 
 think a lot of the rest of the arguments should simply resolve themselves.



 [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
 change to pci_whitelist is because it combined two things. So a stupid/naive 
 solution in my head is, change it to VERY generic name, 
 'pci_devices_information',

 and change schema as an array of {'devices_property'=regex exp, 'group_name' 
 = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
 vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
 more into the pci_devices_information in future, like 'network_information' 
 = xxx or Neutron specific information you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = Acme inc. discombobulator }, info = { group = we like 
teh groups, volume = 11 } }


 All keys other than 'device_property' becomes extra information, i.e. 
 software defined property. These extra information will be carried with the 
 PCI devices,. Some implementation details, A)we can limit the acceptable 
 keys, like we only support 'group_name', 'network_id', or we can accept any 
 keys other than reserved (vendor_id, device_id etc) one.


Not sure we have a good list of reserved keys at the moment, and with two dicts 
it isn't really necessary, I guess.  I would say that we have one match parser 
which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression:
if k.starts_with('e.'):
   mv = extra_specs.get(k[2:])
else:
   mv = pci_details.get(k[2:])
if not match(m, mv):
return False
return True

Usable in this matching (where 'e.' just won't work) and also for flavor 
assignment (where e. will indeed match the extra values).

 B) if a device match 'device_property' in several entries, raise exception, 
 or use the first one.

Use the first one, I think.  It's easier, and potentially more useful.

 [yjiang5_1] Another thing need discussed is, as you pointed out, we would 
 need to add a config param on the control host to decide which flags to group 
 on when doing the stats.  I agree with the design, but some details need 
 decided.

This is a patch that can come at any point after we do the above stuff (which 
we need for Neutron), clearly.

 Where should it defined. If we a) define it in both control node and compute 
 node, then it should be static defined (just change pool_keys in 
 /opt/stack/nova/nova/pci/pci_stats.py to a configuration parameter) . Or b) 
 define only in control node, then I assume the control node should be the 
 scheduler node, and the scheduler manager need save such information, present 
 a API to fetch such information

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Ian Wells
It's worth noting that this makes the scheduling a computationally hard
problem. The answer to that in this scheme is to reduce the number of
inputs to trivialise the problem.  It's going to be O(f(number of flavor
types requested, number of pci_stats pools)) and if you group appropriately
there shouldn't be an excessive number of pci_stats pools.  I am not going
to stand up and say this makes it achievable - and if it doesn't them I'm
not sure that anything would make overlapping flavors achievable - but I
think it gives us some hope.
-- 
Ian.


On 13 January 2014 19:27, Jiang, Yunhong yunhong.ji...@intel.com wrote:

  Hi, Robert, scheduler keep count based on pci_stats instead of the pci
 flavor.



 As stated by Ian at
 https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.htmlalready,
  the flavor will only use the tags used by pci_stats.



 Thanks

 --jyh



 *From:* Robert Li (baoli) [mailto:ba...@cisco.com]
 *Sent:* Monday, January 13, 2014 8:22 AM

 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support



 As I have responded in the other email, and If I understand PCI flavor
 correctly, then the issue that we need to deal with is the overlapping
 issue. A simplest case of this overlapping is that you can define a flavor
 F1 as [vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v']
 .  Let's assume that only the admin can define the flavors. It's not hard
 to see that a device can belong to the two different flavors in the same
 time. This introduces an issue in the scheduler. Suppose the scheduler
 (counts or stats based) maintains counts based on flavors (or the keys
 corresponding to the flavors). To request a device with the flavor F1,
  counts in F2 needs to be subtracted by one as well. There may be several
 ways to achieve that. But regardless, it introduces tremendous overhead in
 terms of system processing and administrative costs.



 What are the use cases for that? How practical are those use cases?



 thanks,

 Robert



 On 1/10/14 9:34 PM, Ian Wells ijw.ubu...@cack.org.uk wrote:




 
  OK - so if this is good then I think the question is how we could change
 the 'pci_whitelist' parameter we have - which, as you say, should either
 *only* do whitelisting or be renamed - to allow us to add information.
  Yongli has something along those lines but it's not flexible and it
 distinguishes poorly between which bits are extra information and which
 bits are matching expressions (and it's still called pci_whitelist) - but
 even with those criticisms it's very close to what we're talking about.
  When we have that I think a lot of the rest of the arguments should simply
 resolve themselves.
 
 
 
  [yjiang5_1] The reason that not easy to find a flexible/distinguishable
 change to pci_whitelist is because it combined two things. So a
 stupid/naive solution in my head is, change it to VERY generic name,
 ‘pci_devices_information’,
 
  and change schema as an array of {‘devices_property’=regex exp,
 ‘group_name’ = ‘g1’} dictionary, and the device_property expression can be
 ‘address ==xxx, vendor_id == xxx’ (i.e. similar with current white list),
  and we can squeeze more into the “pci_devices_information” in future, like
 ‘network_information’ = xxx or “Neutron specific information” you required
 in previous mail.


 We're getting to the stage that an expression parser would be useful,
 annoyingly, but if we are going to try and squeeze it into JSON can I
 suggest:

 { match = { class = Acme inc. discombobulator }, info = { group = we
 like teh groups, volume = 11 } }

 
  All keys other than ‘device_property’ becomes extra information, i.e.
 software defined property. These extra information will be carried with the
 PCI devices,. Some implementation details, A)we can limit the acceptable
 keys, like we only support ‘group_name’, ‘network_id’, or we can accept any
 keys other than reserved (vendor_id, device_id etc) one.


 Not sure we have a good list of reserved keys at the moment, and with two
 dicts it isn't really necessary, I guess.  I would say that we have one
 match parser which looks something like this:

 # does this PCI device match the expression given?
 def match(expression, pci_details, extra_specs):
for (k, v) in expression:
 if k.starts_with('e.'):
mv = extra_specs.get(k[2:])
 else:
mv = pci_details.get(k[2:])
 if not match(m, mv):
 return False
 return True

 Usable in this matching (where 'e.' just won't work) and also for flavor
 assignment (where e. will indeed match the extra values).

  B) if a device match ‘device_property’ in several entries, raise
 exception, or use the first one.

 Use the first one, I think.  It's easier, and potentially more useful.

  [yjiang5_1] Another thing need discussed is, as you pointed out, “we
 would need to add a config param on the control

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Jiang, Yunhong
I'm not network engineer and always lost at 802.1Qbh/802.1BR specs :(  So I'd 
wait for requirement from Neutron. A quick check seems my discussion with Ian 
meet the requirement already?

Thanks
--jyh

From: Irena Berezovsky [mailto:ire...@mellanox.com]
Sent: Monday, January 13, 2014 12:51 AM
To: OpenStack Development Mailing List (not for usage questions)
Cc: Jiang, Yunhong; He, Yongli; Robert Li (baoli) (ba...@cisco.com); Sandhya 
Dasu (sadasu) (sad...@cisco.com); ijw.ubu...@cack.org.uk; j...@johngarbutt.com
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi,
After having a lot of discussions both on IRC and mailing list, I would like to 
suggest to define basic use cases for PCI pass-through network support with 
agreed list of limitations and assumptions  and implement it.  By doing this 
Proof of Concept we will be able to deliver basic PCI pass-through network 
support in Icehouse timeframe and understand better how to provide complete 
solution starting from  tenant /admin API enhancement, enhancing nova-neutron 
communication and eventually provide neutron plugin  supporting the PCI 
pass-through networking.
We can try to split tasks between currently involved participants and bring up 
the basic case. Then we can enhance the implementation.
Having more knowledge and experience with neutron parts, I would like  to start 
working on neutron mechanism driver support.  I have already started to arrange 
the following blueprint doc based on everyone's ideas:
https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit

For the basic PCI pass-through networking case we can assume the following:

1.   Single provider network (PN1)

2.   White list of available SRIOV PCI devices for allocation as NIC for 
neutron networks on provider network  (PN1) is defined on each compute node

3.   Support directly assigned SRIOV PCI pass-through device as vNIC. (This 
will limit the number of tests)

4.   More 


If my suggestion seems reasonable to you, let's try to reach an agreement and 
split the work during our Monday IRC meeting.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Saturday, January 11, 2014 8:36 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Comments with prefix [yjiang5_2] , including the double confirm.

I think we (you and me) is mostly on the same page, would you please give a 
summary, and then we can have community , including Irena/Robert, to check it. 
We need Cores to sponsor it. We should check with John to see if this is 
different with his mentor picture, and we may need a neutron core (I assume 
Cisco has a bunch of Neutron cores :) )to sponsor it?

And, will anyone from Cisco can help on the implementation? After this long 
discussion, we are in half bottom of I release and I'm not sure if Yongli and I 
alone can finish them in I release.

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 6:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support



 OK - so if this is good then I think the question is how we could change the 
 'pci_whitelist' parameter we have - which, as you say, should either *only* 
 do whitelisting or be renamed - to allow us to add information.  Yongli has 
 something along those lines but it's not flexible and it distinguishes poorly 
 between which bits are extra information and which bits are matching 
 expressions (and it's still called pci_whitelist) - but even with those 
 criticisms it's very close to what we're talking about.  When we have that I 
 think a lot of the rest of the arguments should simply resolve themselves.



 [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
 change to pci_whitelist is because it combined two things. So a stupid/naive 
 solution in my head is, change it to VERY generic name, 
 'pci_devices_information',

 and change schema as an array of {'devices_property'=regex exp, 'group_name' 
 = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
 vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
 more into the pci_devices_information in future, like 'network_information' 
 = xxx or Neutron specific information you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = Acme inc. discombobulator }, info = { group = we like 
teh groups, volume = 11 } }

[yjiang5_2] Double confirm that 'match' is whitelist, and info is 'extra info', 
right?  Can the key be more meaningful, for example, 
s/match

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Jiang, Yunhong
Ian, not sure if I get your question. Why should scheduler get the number of 
flavor types requested? The scheduler will only translate the PCI flavor to the 
pci property match requirement like it does now, (either vendor_id, device_id, 
or item in extra_info), then match the translated pci flavor, i.e. pci 
requests, to the pci stats.

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Monday, January 13, 2014 10:57 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

It's worth noting that this makes the scheduling a computationally hard 
problem. The answer to that in this scheme is to reduce the number of inputs to 
trivialise the problem.  It's going to be O(f(number of flavor types requested, 
number of pci_stats pools)) and if you group appropriately there shouldn't be 
an excessive number of pci_stats pools.  I am not going to stand up and say 
this makes it achievable - and if it doesn't them I'm not sure that anything 
would make overlapping flavors achievable - but I think it gives us some hope.
--
Ian.

On 13 January 2014 19:27, Jiang, Yunhong 
yunhong.ji...@intel.commailto:yunhong.ji...@intel.com wrote:
Hi, Robert, scheduler keep count based on pci_stats instead of the pci flavor.

As stated by Ian at 
https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.html 
already, the flavor will only use the tags used by pci_stats.

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.commailto:ba...@cisco.com]
Sent: Monday, January 13, 2014 8:22 AM

To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

As I have responded in the other email, and If I understand PCI flavor 
correctly, then the issue that we need to deal with is the overlapping issue. A 
simplest case of this overlapping is that you can define a flavor F1 as 
[vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v'] .  Let's 
assume that only the admin can define the flavors. It's not hard to see that a 
device can belong to the two different flavors in the same time. This 
introduces an issue in the scheduler. Suppose the scheduler (counts or stats 
based) maintains counts based on flavors (or the keys corresponding to the 
flavors). To request a device with the flavor F1,  counts in F2 needs to be 
subtracted by one as well. There may be several ways to achieve that. But 
regardless, it introduces tremendous overhead in terms of system processing and 
administrative costs.

What are the use cases for that? How practical are those use cases?

thanks,
Robert

On 1/10/14 9:34 PM, Ian Wells 
ijw.ubu...@cack.org.ukmailto:ijw.ubu...@cack.org.uk wrote:



 OK - so if this is good then I think the question is how we could change the 
 'pci_whitelist' parameter we have - which, as you say, should either *only* 
 do whitelisting or be renamed - to allow us to add information.  Yongli has 
 something along those lines but it's not flexible and it distinguishes poorly 
 between which bits are extra information and which bits are matching 
 expressions (and it's still called pci_whitelist) - but even with those 
 criticisms it's very close to what we're talking about.  When we have that I 
 think a lot of the rest of the arguments should simply resolve themselves.



 [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
 change to pci_whitelist is because it combined two things. So a stupid/naive 
 solution in my head is, change it to VERY generic name, 
 'pci_devices_information',

 and change schema as an array of {'devices_property'=regex exp, 'group_name' 
 = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
 vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
 more into the pci_devices_information in future, like 'network_information' 
 = xxx or Neutron specific information you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = Acme inc. discombobulator }, info = { group = we like 
teh groups, volume = 11 } }


 All keys other than 'device_property' becomes extra information, i.e. 
 software defined property. These extra information will be carried with the 
 PCI devices,. Some implementation details, A)we can limit the acceptable 
 keys, like we only support 'group_name', 'network_id', or we can accept any 
 keys other than reserved (vendor_id, device_id etc) one.


Not sure we have a good list of reserved keys at the moment, and with two dicts 
it isn't really necessary, I guess.  I would say that we have one match parser 
which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Ian Wells
If there are N flavor types there are N match expressions so I think it's
pretty much equivalent in terms of complexity.  It looks like some sort of
packing problem to me, trying to fit N objects into M boxes, hence my
statement that it's not going to be easy, but that's just a gut feeling -
some of the matches can be vague, such as only the vendor ID or a vendor
and two device types, so it's not as simple as one flavor matching one
stats row.
-- 
Ian.


On 13 January 2014 21:00, Jiang, Yunhong yunhong.ji...@intel.com wrote:

  Ian, not sure if I get your question. Why should scheduler get the
 number of flavor types requested? The scheduler will only translate the PCI
 flavor to the pci property match requirement like it does now, (either
 vendor_id, device_id, or item in extra_info), then match the translated pci
 flavor, i.e. pci requests, to the pci stats.



 Thanks

 --jyh



 *From:* Ian Wells [mailto:ijw.ubu...@cack.org.uk]
 *Sent:* Monday, January 13, 2014 10:57 AM

 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support



 It's worth noting that this makes the scheduling a computationally hard
 problem. The answer to that in this scheme is to reduce the number of
 inputs to trivialise the problem.  It's going to be O(f(number of flavor
 types requested, number of pci_stats pools)) and if you group appropriately
 there shouldn't be an excessive number of pci_stats pools.  I am not going
 to stand up and say this makes it achievable - and if it doesn't them I'm
 not sure that anything would make overlapping flavors achievable - but I
 think it gives us some hope.
 --

 Ian.



 On 13 January 2014 19:27, Jiang, Yunhong yunhong.ji...@intel.com wrote:

 Hi, Robert, scheduler keep count based on pci_stats instead of the pci
 flavor.



 As stated by Ian at
 https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.htmlalready,
  the flavor will only use the tags used by pci_stats.



 Thanks

 --jyh



 *From:* Robert Li (baoli) [mailto:ba...@cisco.com]
 *Sent:* Monday, January 13, 2014 8:22 AM


 *To:* OpenStack Development Mailing List (not for usage questions)
 *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support



 As I have responded in the other email, and If I understand PCI flavor
 correctly, then the issue that we need to deal with is the overlapping
 issue. A simplest case of this overlapping is that you can define a flavor
 F1 as [vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v']
 .  Let's assume that only the admin can define the flavors. It's not hard
 to see that a device can belong to the two different flavors in the same
 time. This introduces an issue in the scheduler. Suppose the scheduler
 (counts or stats based) maintains counts based on flavors (or the keys
 corresponding to the flavors). To request a device with the flavor F1,
  counts in F2 needs to be subtracted by one as well. There may be several
 ways to achieve that. But regardless, it introduces tremendous overhead in
 terms of system processing and administrative costs.



 What are the use cases for that? How practical are those use cases?



 thanks,

 Robert



 On 1/10/14 9:34 PM, Ian Wells ijw.ubu...@cack.org.uk wrote:




 
  OK - so if this is good then I think the question is how we could change
 the 'pci_whitelist' parameter we have - which, as you say, should either
 *only* do whitelisting or be renamed - to allow us to add information.
  Yongli has something along those lines but it's not flexible and it
 distinguishes poorly between which bits are extra information and which
 bits are matching expressions (and it's still called pci_whitelist) - but
 even with those criticisms it's very close to what we're talking about.
  When we have that I think a lot of the rest of the arguments should simply
 resolve themselves.
 
 
 
  [yjiang5_1] The reason that not easy to find a flexible/distinguishable
 change to pci_whitelist is because it combined two things. So a
 stupid/naive solution in my head is, change it to VERY generic name,
 ‘pci_devices_information’,
 
  and change schema as an array of {‘devices_property’=regex exp,
 ‘group_name’ = ‘g1’} dictionary, and the device_property expression can be
 ‘address ==xxx, vendor_id == xxx’ (i.e. similar with current white list),
  and we can squeeze more into the “pci_devices_information” in future, like
 ‘network_information’ = xxx or “Neutron specific information” you required
 in previous mail.


 We're getting to the stage that an expression parser would be useful,
 annoyingly, but if we are going to try and squeeze it into JSON can I
 suggest:

 { match = { class = Acme inc. discombobulator }, info = { group = we
 like teh groups, volume = 11 } }

 
  All keys other than ‘device_property’ becomes extra information, i.e.
 software defined property. These extra information will be carried with the
 PCI devices

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-12 Thread Irena Berezovsky
Hi John,
Thank you for taking an initiative and summing up the work that need to be done 
to provide PCI pass-through network support.
The only item I think is missing is the neutron support for PCI pass-through. 
Currently we have Mellanox Plugin that supports PCI pass-through assuming 
Mellanox Adapter card embedded switch technology. But in order to have fully 
integrated  PCI pass-through networking support for the use cases Robert listed 
on previous mail, the generic neutron PCI pass-through support is required. 
This can be enhanced with vendor specific task that may differ (Mellanox 
Embedded switch vs Cisco 802.1BR), but there is still common part of being PCI 
aware mechanism driver. 
I have already started with definition for this part:
https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#
I also plan to start coding soon.

Depends on how it goes, I can take also nova parts that integrate with neutron 
APIs from item 3.
 
Regards,
Irena

-Original Message-
From: John Garbutt [mailto:j...@johngarbutt.com] 
Sent: Friday, January 10, 2014 4:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Apologies for this top post, I just want to move this discussion towards action.

I am traveling next week so it is unlikely that I can make the meetings. Sorry.

Can we please agree on some concrete actions, and who will do the coding?
This also means raising new blueprints for each item of work.
I am happy to review and eventually approve those blueprints, if you email me 
directly.

Ideas are taken from what we started to agree on, mostly written up here:
https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions


What doesn't need doing...


We have PCI whitelist and PCI alias at the moment, let keep those names the 
same for now.
I personally prefer PCI-flavor, rather than PCI-alias, but lets discuss any 
rename separately.

We seemed happy with the current system (roughly) around GPU passthrough:
nova flavor-key three_GPU_attached_30GB set pci_passthrough:alias= 
large_GPU:1,small_GPU:2
nova boot --image some_image --flavor three_GPU_attached_30GB some_name

Again, we seemed happy with the current PCI whitelist.

Sure, we could optimise the scheduling, but again, please keep that a separate 
discussion.
Something in the scheduler needs to know how many of each PCI alias are 
available on each host.
How that information gets there can be change at a later date.

PCI alias is in config, but its probably better defined using host aggregates, 
or some custom API.
But lets leave that for now, and discuss it separately.
If the need arrises, we can migrate away from the config.


What does need doing...
==

1) API  CLI changes for nic-type, and associated tempest tests

* Add a user visible nic-type so users can express on of several network 
types.
* We need a default nic-type, for when the user doesn't specify one (might 
default to SRIOV in some cases)
* We can easily test the case where the default is virtual and the user 
expresses a preference for virtual
* Above is much better than not testing it at all.

nova boot --flavor m1.large --image image_id
  --nic net-id=net-id-1
  --nic net-id=net-id-2,nic-type=fast
  --nic net-id=net-id-3,nic-type=fast vm-name

or

neutron port-create
  --fixed-ip subnet_id=subnet-id,ip_address=192.168.57.101
  --nic-type=slow | fast | foobar
  net-id
nova boot --flavor m1.large --image image_id --nic port-id=port-id

Where nic-type is just an extra bit metadata string that is passed to nova and 
the VIF driver.


2) Expand PCI alias information

We need extensions to PCI alias so we can group SRIOV devices better.

I still think we are yet to agree on a format, but I would suggest this as a 
starting point:

{
 name:GPU_fast,
 devices:[
  {vendor_id:1137,product_id:0071, address:*, attach-type:direct},
  {vendor_id:1137,product_id:0072, address:*, attach-type:direct} 
 ],
 sriov_info: {}
}

{
 name:NIC_fast,
 devices:[
  {vendor_id:1137,product_id:0071, address:0:[1-50]:2:*, 
attach-type:macvtap}
  {vendor_id:1234,product_id:0081, address:*, attach-type:direct} 
 ],
 sriov_info: {
  nic_type:fast,
  network_ids: [net-id-1, net-id-2]  } }

{
 name:NIC_slower,
 devices:[
  {vendor_id:1137,product_id:0071, address:*, attach-type:direct}
  {vendor_id:1234,product_id:0081, address:*, attach-type:direct} 
 ],
 sriov_info: {
  nic_type:fast,
  network_ids: [*]  # this means could attach to any network  } }

The idea being the VIF driver gets passed this info, when network_info includes 
a nic that matches.
Any other details, like VLAN id, would come from neutron, and passed to the VIF 
driver as normal.


3) Reading nic_type and doing the PCI passthrough of NIC user requests

Not sure we are agreed on this, but basically:
* network_info contains nic-type from neutron
* need to select the correct VIF

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells
On 10 January 2014 07:40, Jiang, Yunhong yunhong.ji...@intel.com wrote:

  Robert, sorry that I’m not fan of * your group * term. To me, *your
 group” mixed two thing. It’s an extra property provided by configuration,
 and also it’s a very-not-flexible mechanism to select devices (you can only
 select devices based on the ‘group name’ property).


It is exactly that.  It's 0 new config items, 0 new APIs, just an extra tag
on the whitelists that are already there (although the proposal suggests
changing the name of them to be more descriptive of what they now do).  And
you talk about flexibility as if this changes frequently, but in fact the
grouping / aliasing of devices almost never changes after installation,
which is, not coincidentally, when the config on the compute nodes gets set
up.

  1)   A dynamic group is much better. For example, user may want to
 select GPU device based on vendor id, or based on vendor_id+device_id. In
 another word, user want to create group based on vendor_id, or
 vendor_id+device_id and select devices from these group.  John’s proposal
 is very good, to provide an API to create the PCI flavor(or alias). I
 prefer flavor because it’s more openstack style.

I disagree with this.  I agree that what you're saying offers a more
flexibilibility after initial installation but I have various issues with
it.

This is directly related to the hardware configuation on each compute
node.  For (some) other things of this nature, like provider networks, the
compute node is the only thing that knows what it has attached to it, and
it is the store (in configuration) of that information.  If I add a new
compute node then it's my responsibility to configure it correctly on
attachment, but when I add a compute node (when I'm setting the cluster up,
or sometime later on) then it's at that precise point that I know how I've
attached it and what hardware it's got on it.  Also, it's at this that
point in time that I write out the configuration file (not by hand, note;
there's almost certainly automation when configuring hundreds of nodes so
arguments that 'if I'm writing hundreds of config files one will be wrong'
are moot).

I'm also not sure there's much reason to change the available devices
dynamically after that, since that's normally an activity that results from
changing the physical setup of the machine which implies that actually
you're going to have access to and be able to change the config as you do
it.  John did come up with one case where you might be trying to remove old
GPUs from circulation, but it's a very uncommon case that doesn't seem
worth coding for, and it's still achievable by changing the config and
restarting the compute processes.

This also reduces the autonomy of the compute node in favour of centralised
tracking, which goes against the 'distributed where possible' philosophy of
Openstack.

Finally, you're not actually removing configuration from the compute node.
You still have to configure a whitelist there; in the grouping design you
also have to configure grouping (flavouring) on the control node as well.
The groups proposal adds one extra piece of information to the whitelists
that are already there to mark groups, not a whole new set of config lines.


To compare scheduling behaviour:

If I  need 4G of RAM, each compute node has reported its summary of free
RAM to the scheduler.  I look for a compute node with 4G free, and filter
the list of compute nodes down.  This is a query on n records, n being the
number of compute nodes.  I schedule to the compute node, which then
confirms it does still have 4G free and runs the VM or rejects the request.

If I need 3 PCI devices and use the current system, each machine has
reported its device allocations to the scheduler.  With SRIOV multiplying
up the number of available devices, it's reporting back hundreds of records
per compute node to the schedulers, and the filtering activity is a 3
queries on n * number of PCI devices in cloud records, which could easily
end up in the tens or even hundreds of thousands of records for a
moderately sized cloud.  There compute node also has a record of its device
allocations which is also checked and updated before the final request is
run.

If I need 3 PCI devices and use the groups system, each machine has
reported its device *summary* to the scheduler.  With SRIOV multiplying up
the number of available devices, it's still reporting one or a small number
of categories, i.e. { net: 100}.  The difficulty of scheduling is a query
on num groups * n records - fewer, in fact, if some machines have no
passthrough devices.

You can see that there's quite a cost to be paid for having those flexible
alias APIs.

 4)   IMHO, the core for nova PCI support is **PCI property**. The
 property means not only generic PCI devices like vendor id, device id,
 device type, compute specific property like BDF address, the adjacent
 switch IP address,  but also user defined property like 

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells
In any case, we don't have to decide this now.  If we simply allowed the
whitelist to add extra arbitrary properties to the PCI record (like a group
name) and return it to the central server, we could use that in scheduling
for the minute as a group name, we wouldn't implement the APIs for flavors
yet, and we could get a working system that would be minimally changed from
what we already have.  We could worry about the scheduling in the
scheduling group, and we could leave the APIs (which, as I say, are a
minimally useful feature) untl later.  then we'd have something useful in
short order.
-- 
Ian.


On 10 January 2014 13:08, Ian Wells ijw.ubu...@cack.org.uk wrote:

 On 10 January 2014 07:40, Jiang, Yunhong yunhong.ji...@intel.com wrote:

  Robert, sorry that I’m not fan of * your group * term. To me, *your
 group” mixed two thing. It’s an extra property provided by configuration,
 and also it’s a very-not-flexible mechanism to select devices (you can only
 select devices based on the ‘group name’ property).


 It is exactly that.  It's 0 new config items, 0 new APIs, just an extra
 tag on the whitelists that are already there (although the proposal
 suggests changing the name of them to be more descriptive of what they now
 do).  And you talk about flexibility as if this changes frequently, but in
 fact the grouping / aliasing of devices almost never changes after
 installation, which is, not coincidentally, when the config on the compute
 nodes gets set up.

  1)   A dynamic group is much better. For example, user may want to
 select GPU device based on vendor id, or based on vendor_id+device_id. In
 another word, user want to create group based on vendor_id, or
 vendor_id+device_id and select devices from these group.  John’s proposal
 is very good, to provide an API to create the PCI flavor(or alias). I
 prefer flavor because it’s more openstack style.

 I disagree with this.  I agree that what you're saying offers a more
 flexibilibility after initial installation but I have various issues with
 it.

 This is directly related to the hardware configuation on each compute
 node.  For (some) other things of this nature, like provider networks, the
 compute node is the only thing that knows what it has attached to it, and
 it is the store (in configuration) of that information.  If I add a new
 compute node then it's my responsibility to configure it correctly on
 attachment, but when I add a compute node (when I'm setting the cluster up,
 or sometime later on) then it's at that precise point that I know how I've
 attached it and what hardware it's got on it.  Also, it's at this that
 point in time that I write out the configuration file (not by hand, note;
 there's almost certainly automation when configuring hundreds of nodes so
 arguments that 'if I'm writing hundreds of config files one will be wrong'
 are moot).

 I'm also not sure there's much reason to change the available devices
 dynamically after that, since that's normally an activity that results from
 changing the physical setup of the machine which implies that actually
 you're going to have access to and be able to change the config as you do
 it.  John did come up with one case where you might be trying to remove old
 GPUs from circulation, but it's a very uncommon case that doesn't seem
 worth coding for, and it's still achievable by changing the config and
 restarting the compute processes.

 This also reduces the autonomy of the compute node in favour of
 centralised tracking, which goes against the 'distributed where possible'
 philosophy of Openstack.

 Finally, you're not actually removing configuration from the compute
 node.  You still have to configure a whitelist there; in the grouping
 design you also have to configure grouping (flavouring) on the control node
 as well.  The groups proposal adds one extra piece of information to the
 whitelists that are already there to mark groups, not a whole new set of
 config lines.


 To compare scheduling behaviour:

 If I  need 4G of RAM, each compute node has reported its summary of free
 RAM to the scheduler.  I look for a compute node with 4G free, and filter
 the list of compute nodes down.  This is a query on n records, n being the
 number of compute nodes.  I schedule to the compute node, which then
 confirms it does still have 4G free and runs the VM or rejects the request.

 If I need 3 PCI devices and use the current system, each machine has
 reported its device allocations to the scheduler.  With SRIOV multiplying
 up the number of available devices, it's reporting back hundreds of records
 per compute node to the schedulers, and the filtering activity is a 3
 queries on n * number of PCI devices in cloud records, which could easily
 end up in the tens or even hundreds of thousands of records for a
 moderately sized cloud.  There compute node also has a record of its device
 allocations which is also checked and updated before the final request is
 run.

 If I need 3 PCI 

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread John Garbutt
Apologies for this top post, I just want to move this discussion towards action.

I am traveling next week so it is unlikely that I can make the meetings. Sorry.

Can we please agree on some concrete actions, and who will do the coding?
This also means raising new blueprints for each item of work.
I am happy to review and eventually approve those blueprints, if you
email me directly.

Ideas are taken from what we started to agree on, mostly written up here:
https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions


What doesn't need doing...


We have PCI whitelist and PCI alias at the moment, let keep those
names the same for now.
I personally prefer PCI-flavor, rather than PCI-alias, but lets
discuss any rename separately.

We seemed happy with the current system (roughly) around GPU passthrough:
nova flavor-key three_GPU_attached_30GB set
pci_passthrough:alias= large_GPU:1,small_GPU:2
nova boot --image some_image --flavor three_GPU_attached_30GB some_name

Again, we seemed happy with the current PCI whitelist.

Sure, we could optimise the scheduling, but again, please keep that a
separate discussion.
Something in the scheduler needs to know how many of each PCI alias
are available on each host.
How that information gets there can be change at a later date.

PCI alias is in config, but its probably better defined using host
aggregates, or some custom API.
But lets leave that for now, and discuss it separately.
If the need arrises, we can migrate away from the config.


What does need doing...
==

1) API  CLI changes for nic-type, and associated tempest tests

* Add a user visible nic-type so users can express on of several
network types.
* We need a default nic-type, for when the user doesn't specify one
(might default to SRIOV in some cases)
* We can easily test the case where the default is virtual and the
user expresses a preference for virtual
* Above is much better than not testing it at all.

nova boot --flavor m1.large --image image_id
  --nic net-id=net-id-1
  --nic net-id=net-id-2,nic-type=fast
  --nic net-id=net-id-3,nic-type=fast vm-name

or

neutron port-create
  --fixed-ip subnet_id=subnet-id,ip_address=192.168.57.101
  --nic-type=slow | fast | foobar
  net-id
nova boot --flavor m1.large --image image_id --nic port-id=port-id

Where nic-type is just an extra bit metadata string that is passed to
nova and the VIF driver.


2) Expand PCI alias information

We need extensions to PCI alias so we can group SRIOV devices better.

I still think we are yet to agree on a format, but I would suggest
this as a starting point:

{
 name:GPU_fast,
 devices:[
  {vendor_id:1137,product_id:0071, address:*, attach-type:direct},
  {vendor_id:1137,product_id:0072, address:*, attach-type:direct}
 ],
 sriov_info: {}
}

{
 name:NIC_fast,
 devices:[
  {vendor_id:1137,product_id:0071, address:0:[1-50]:2:*,
attach-type:macvtap}
  {vendor_id:1234,product_id:0081, address:*, attach-type:direct}
 ],
 sriov_info: {
  nic_type:fast,
  network_ids: [net-id-1, net-id-2]
 }
}

{
 name:NIC_slower,
 devices:[
  {vendor_id:1137,product_id:0071, address:*, attach-type:direct}
  {vendor_id:1234,product_id:0081, address:*, attach-type:direct}
 ],
 sriov_info: {
  nic_type:fast,
  network_ids: [*]  # this means could attach to any network
 }
}

The idea being the VIF driver gets passed this info, when network_info
includes a nic that matches.
Any other details, like VLAN id, would come from neutron, and passed
to the VIF driver as normal.


3) Reading nic_type and doing the PCI passthrough of NIC user requests

Not sure we are agreed on this, but basically:
* network_info contains nic-type from neutron
* need to select the correct VIF driver
* need to pass matching PCI alias information to VIF driver
* neutron passes details other details (like VLAN id) as before
* nova gives VIF driver an API that allows it to attach PCI devices
that are in the whitelist to the VM being configured
* with all this, the VIF driver can do what it needs to do
* lets keep it simple, and expand it as the need arrises

4) Make changes to VIF drivers, so the above is implemented

Depends on (3)



These seems like some good steps to get the basics in place for PCI
passthrough networking.
Once its working, we can review it and see if there are things that
need to evolve further.

Does that seem like a workable approach?
Who is willing to implement any of (1), (2) and (3)?


Cheers,
John


On 9 January 2014 17:47, Ian Wells ijw.ubu...@cack.org.uk wrote:
 I think I'm in agreement with all of this.  Nice summary, Robert.

 It may not be where the work ends, but if we could get this done the rest is
 just refinement.


 On 9 January 2014 17:49, Robert Li (baoli) ba...@cisco.com wrote:

 Hi Folks,


 With John joining the IRC, so far, we had a couple of productive meetings
 in an effort to come to consensus and move forward. Thanks John for doing
 that, and I appreciate everyone's effort to make it to the daily 

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Alan Kavanagh
+1 PCI Flavor.

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: January-10-14 1:56 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

BTW, I like the PCI flavor :)

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Thursday, January 09, 2014 10:41 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi, Ian, when you in aggrement with all of this, do you agree with the 'group 
name', or agree with John's pci flavor?
I'm against the PCI group and will send out a reply later.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 09, 2014 9:47 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
ba...@cisco.commailto:ba...@cisco.com wrote:
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
vendor_id:,product_id:}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={vendor_id:, product_id:, name:str}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:pci_passthrough:alias=name:count

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ vendor_id:,product_id:, 
name:str}]

By doing so, we eliminated the PCI alias. And we call the name in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the -nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is connected to a physical switch, or a nic that 
is connected to a physical switch, but has a host macvtap device in between. 
The actual

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Robert Li (baoli)
 design, I 
presume. The bottom line is that we want those requirements to be met.


4)   IMHO, the core for nova PCI support is *PCI property*. The property 
means not only generic PCI devices like vendor id, device id, device type, 
compute specific property like BDF address, the adjacent switch IP address,  
but also user defined property like nuertron’s physical net name etc. And then, 
it’s about how to get these property, how to select/group devices based on the 
property, how to store/fetch these properties.



I agree. But that's exactly what we are trying to accomplish.

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Thursday, January 09, 2014 8:49 AM
To: OpenStack Development Mailing List (not for usage questions); Irena 
Berezovsky; Sandhya Dasu (sadasu); Jiang, Yunhong; Itzik Brown; 
j...@johngarbutt.commailto:j...@johngarbutt.com; He, Yongli
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
vendor_id:,product_id:}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={vendor_id:, product_id:, name:str}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:pci_passthrough:alias=name:count

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ vendor_id:,product_id:, 
name:str}]

By doing so, we eliminated the PCI alias. And we call the name in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the —nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is connected to a physical switch, or a nic that 
is connected to a physical switch, but has a host macvtap device in between. 
The actual names of the choices are not important here, and can be debated.

I'm hoping that we can go over the above on Monday. But any comments are 
welcome by email.

Thanks,
Robert

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Robert Li (baoli)
Hi Yongli,

Please also see my response to Yunhong. Here, I just want to add a comment 
about your local versus global argument. I took a brief look at your patches, 
and the PCI-flavor is added into the whitelist. The compute node needs to know 
these pci-flavors in order to report PCI stats based on them. Please correct me 
if I'm wrong.

Another comment is that a compute node doesn't need to consult with the 
controller, but it's report or registration of resources may be rejected by the 
controller due to non-existing PCI groups.

thanks,
Robert

On 1/10/14 2:11 AM, yongli he 
yongli...@intel.commailto:yongli...@intel.com wrote:

On 2014年01月10日 00:49, Robert Li (baoli) wrote:
Hi Folks,
HI, all

basiclly i flavor  the pic-flavor style and against massing  the white-list. 
please see my inline comments.



With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
vendor_id:,product_id:}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={vendor_id:, product_id:, name:str}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:pci_passthrough:alias=name:count

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ vendor_id:,product_id:, 
name:str}]

By doing so, we eliminated the PCI alias. And we call the name in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of
the white list configuration is mostly local to a host, so only address in 
there, like John's proposal is good. mix the group into the whitelist means we 
make the global thing per host style, this is maybe wrong.

benefits can be harvested:

 * the implementation is significantly simplified
but more mass, refer my new patches already sent out.
 * provisioning is simplified by eliminating the PCI alias
pci alias provide a good way to define a global reference-able name for PCI, we 
need this, this is also true for John's pci-flavor.
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
simplify this seems like good, but it does not, separated the local and global 
is the instinct nature simplify.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
this mean you should consult the controller to deploy your host, if we keep 
white-list local, we simplify the deploy.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the —nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.
i still feel use alias/pci flavor is better solution.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.
default 

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong
Ian, thanks for your reply. Please check my response prefix with 'yjiang5'.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 4:08 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

On 10 January 2014 07:40, Jiang, Yunhong 
yunhong.ji...@intel.commailto:yunhong.ji...@intel.com wrote:
Robert, sorry that I'm not fan of * your group * term. To me, *your group 
mixed two thing. It's an extra property provided by configuration, and also 
it's a very-not-flexible mechanism to select devices (you can only select 
devices based on the 'group name' property).

It is exactly that.  It's 0 new config items, 0 new APIs, just an extra tag on 
the whitelists that are already there (although the proposal suggests changing 
the name of them to be more descriptive of what they now do).  And you talk 
about flexibility as if this changes frequently, but in fact the grouping / 
aliasing of devices almost never changes after installation, which is, not 
coincidentally, when the config on the compute nodes gets set up.

1)   A dynamic group is much better. For example, user may want to select 
GPU device based on vendor id, or based on vendor_id+device_id. In another 
word, user want to create group based on vendor_id, or vendor_id+device_id and 
select devices from these group.  John's proposal is very good, to provide an 
API to create the PCI flavor(or alias). I prefer flavor because it's more 
openstack style.
I disagree with this.  I agree that what you're saying offers a more 
flexibilibility after initial installation but I have various issues with it.
[yjiang5] I think you talking is mostly about white list, instead of PCI 
flavor. PCI flavor is more about PCI request, like I want to have a device with 
vendor_id = cisco, device_id= 15454E, or 'vendor_id=intel device_class=nic' , 
( because the image have the driver for all Intel NIC card :)  ). While 
whitelist is to decide the device that is assignable in a host.


This is directly related to the hardware configuation on each compute node.  
For (some) other things of this nature, like provider networks, the compute 
node is the only thing that knows what it has attached to it, and it is the 
store (in configuration) of that information.  If I add a new compute node then 
it's my responsibility to configure it correctly on attachment, but when I add 
a compute node (when I'm setting the cluster up, or sometime later on) then 
it's at that precise point that I know how I've attached it and what hardware 
it's got on it.  Also, it's at this that point in time that I write out the 
configuration file (not by hand, note; there's almost certainly automation when 
configuring hundreds of nodes so arguments that 'if I'm writing hundreds of 
config files one will be wrong' are moot).

I'm also not sure there's much reason to change the available devices 
dynamically after that, since that's normally an activity that results from 
changing the physical setup of the machine which implies that actually you're 
going to have access to and be able to change the config as you do it.  John 
did come up with one case where you might be trying to remove old GPUs from 
circulation, but it's a very uncommon case that doesn't seem worth coding for, 
and it's still achievable by changing the config and restarting the compute 
processes.
[yjiag5] I totally agree with you that whitelist is static defined when 
provision. I just want to separate the information of 'provider network' to 
another configuration (like extra information). Whitelist is just white list to 
decide the device assignable. The provider network is information of the 
device, it's not in the scope of the white list.
This also reduces the autonomy of the compute node in favour of centralised 
tracking, which goes against the 'distributed where possible' philosophy of 
Openstack.
Finally, you're not actually removing configuration from the compute node.  You 
still have to configure a whitelist there; in the grouping design you also have 
to configure grouping (flavouring) on the control node as well.  The groups 
proposal adds one extra piece of information to the whitelists that are already 
there to mark groups, not a whole new set of config lines.
[yjiang5] Still, while list is to decide the device assignable, not to provide 
device information. We should mixed functionality to the configuration. If it's 
ok, I simply want to discard the 'group' term :) The nova PCI flow is simple, 
compute node provide PCI device (based on white list), the scheduler track the 
PCI device information (abstracted as pci_stats for performance issue), the API 
provide method that user specify the device they wanted (the PCI flavor). 
Current implementation need enhancement on each step of the flow, but I really 
see no reason to have the Group. Yes, the 'PCI flavor' in fact create group 
based on PCI

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong
Brian, the issue of 'class name' is because currently the libvirt does not 
provide such information, otherwise we are glad to add that :(
But this is a good point and we have considered already. One solution is to 
retrieve it through some code like read the configuration space directly. But 
that's not so easy especially considering the different platform has different 
method to get the configuration space. A workaround (at least in first step) is 
to use the user defined property, so that user can define it through 
configuration space.

The issue to udev is, it's linux specific, and it may even various in different 
distribution.

Thanks
--jyh

From: Brian Schott [mailto:brian.sch...@nimbisservices.com]
Sent: Thursday, January 09, 2014 11:19 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Ian,

The idea of pci flavors is a great and using vendor_id and product_id make 
sense, but I could see a case for adding the class name such as 'VGA compatible 
controller'. Otherwise, slightly different generations of hardware will mean 
custom whitelist setups on each compute node.

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)

On the flip side, vendor_id and product_id might not be sufficient.  Suppose I 
have two identical NICs, one for nova internal use and the second for guest 
tenants?  So, bus numbering may be required.

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)

Some possible combinations:

# take 2 gpus
pci_passthrough_whitelist=[
 { vendor_id:NVIDIA Corporation G71,product_id:GeForce 7900 GTX, 
name:GPU},
]

# only take the GPU on PCI 2
pci_passthrough_whitelist=[
 { vendor_id:NVIDIA Corporation G71,product_id:GeForce 7900 GTX, 
'bus_id': '02:', name:GPU},
]
pci_passthrough_whitelist=[
 {bus_id: 01:00.0, name: GPU},
 {bus_id: 02:00.0, name: GPU},
]

pci_passthrough_whitelist=[
 {class: VGA compatible controller, name: GPU},
]

pci_passthrough_whitelist=[
 { product_id:GeForce 7900 GTX, name:GPU},
]

I know you guys are thinking of PCI devices, but any though of mapping to 
something like udev rather than pci?  Supporting udev rules might be easier and 
more robust rather than making something up.

Brian

-
Brian Schott, CTO
Nimbis Services, Inc.
brian.sch...@nimbisservices.commailto:brian.sch...@nimbisservices.com
ph: 443-274-6064  fx: 443-274-6060



On Jan 9, 2014, at 12:47 PM, Ian Wells 
ijw.ubu...@cack.org.ukmailto:ijw.ubu...@cack.org.uk wrote:


I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
ba...@cisco.commailto:ba...@cisco.com wrote:

Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
vendor_id:,product_id:}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={vendor_id:, product_id:, name:str}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:pci_passthrough:alias=name:count

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ vendor_id:,product_id:, 
name:str}]

By doing so, we eliminated the PCI alias. And we call the name in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells
On 10 January 2014 15:30, John Garbutt j...@johngarbutt.com wrote:

 We seemed happy with the current system (roughly) around GPU passthrough:
 nova flavor-key three_GPU_attached_30GB set
 pci_passthrough:alias= large_GPU:1,small_GPU:2
 nova boot --image some_image --flavor three_GPU_attached_30GB some_name


Actually, I think we pretty solidly disagree on this point.  On the other
hand, Yongli's current patch (with pci_flavor in the whitelist) is pretty
OK.


 nova boot --flavor m1.large --image image_id
   --nic net-id=net-id-1
   --nic net-id=net-id-2,nic-type=fast

  --nic net-id=net-id-3,nic-type=fast vm-name


With flavor defined (wherever it's defined):

nova boot ..
   --nic net-id=net-id-1,pci-flavor=xxx# ok, presumably defaults to
PCI passthrough
   --nic net-id=net-id-1,pci-flavor=xxx,vnic-attach=macvtap # ok
   --nic net-id=net-id-1 # ok - no flavor = vnic
   --nic port-id=net-id-1,pci-flavor=xxx# ok, gets vnic-attach from
port
   --nic port-id=net-id-1 # ok - no flavor = vnic



 or

 neutron port-create
   --fixed-ip subnet_id=subnet-id,ip_address=192.168.57.101
   --nic-type=slow | fast | foobar
   net-id
 nova boot --flavor m1.large --image image_id --nic port-id=port-id


No, I think not - specifically because flavors are a nova concept and not a
neutron one, so putting them on the port is inappropriate. Conversely,
vnic-attach is a Neutron concept (fine, nova implements it, but Neutron
tells it how) so I think it *is* a port field, and we'd just set it on the
newly created port when doing nova boot ..,vnic-attach=thing

2) Expand PCI alias information

{
  name:NIC_fast,
   sriov_info: {
   nic_type:fast,

  network_ids: [net-id-1, net-id-2]


Why can't we use the flavor name in --nic (because multiple flavors might
be on one NIC type, I guess)?  Where does e.g. switch/port information go,
particularly since it's per-device (not per-group) and non-scheduling?

I think the issue here is that you assume we group by flavor, then add
extra info, then group into a NIC group.  But for a lot of use cases there
is information that differs on every NIC port, so it makes more sense to
add extra info to a device, then group into flavor and that can also be
used for the --nic.

network_ids is interesting, but this is a nova config file and network_ids
are (a) from Neutron (b) ephemeral, so we can't put them in config.  They
could be provider network names, but that's not the same thing as a neutron
network name and not easily discoverable, outside of Neutron i.e. before
scheduling.

Again, Yongli's current change with pci-flavor in the whitelist records
leads to a reasonable way to how to make this work here, I think;
straightforward extra_info would be fine (though perhaps nice if it's
easier to spot it as of a different type from the whitelist regex fields).
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells
Hey Yunhong,

The thing about 'group' and 'flavor' and 'whitelist' is that they once
meant distinct things (and I think we've been trying to reduce them back
from three things to two or one):

- group: equivalent devices at a host level - use any one, no-one will
care, because they're either identical or as near as makes no difference
- flavor: equivalent devices to an end user - we may re-evaluate our
offerings and group them differently on the fly
- whitelist: either 'something to match the devices you may assign'
(originally) or 'something to match the devices you may assign *and* put
them in the group (in the group proposal)

Bearing in mind what you said about scheduling, and if we skip 'group' for
a moment, then can I suggest (or possibly restate, because your comments
are pointing in this direction):

- we allow extra information to be added at what is now the whitelisting
stage, that just gets carried around with the device
- when we're turning devices into flavors, we can also match on that extra
information if we want (which means we can tag up the devices on the
compute node if we like, according to taste, and then bundle them up by tag
to make flavors; or we can add Neutron specific information and ignore it
when making flavors)
- we would need to add a config param on the control host to decide which
flags to group on when doing the stats (and they would additionally be the
only params that would work for flavors, I think)
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong
Ian, thanks for your reply. Please check comments prefix with [yjiang5].

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 12:17 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hey Yunhong,

The thing about 'group' and 'flavor' and 'whitelist' is that they once meant 
distinct things (and I think we've been trying to reduce them back from three 
things to two or one):

- group: equivalent devices at a host level - use any one, no-one will care, 
because they're either identical or as near as makes no difference
- flavor: equivalent devices to an end user - we may re-evaluate our offerings 
and group them differently on the fly
- whitelist: either 'something to match the devices you may assign' 
(originally) or 'something to match the devices you may assign *and* put them 
in the group (in the group proposal)

[yjiang5] Really thanks for the summary and it is quite clear. So what's the 
object of equivalent devices at host level? Because 'equivalent device * to 
an end user * is flavor, so is it 'equivalent to *scheduler* or 'equivalent 
to *xxx*'? If equivalent to scheduler, then I'd take the pci_stats as a 
flexible group for scheduler, and I'd think 'equivalent for scheduler' as a 
restriction for 'equivalent to end user' because of performance issue, 
otherwise, it's needless.   Secondly, for your definition of 'whitelist', I'm 
hesitate to your '*and*' because IMHO, 'and' means mixed two things together, 
otherwise, we can state in simply one sentence. For example, I prefer to have 
another configuration option to define the 'put devices in the group', or, if 
we extend it , be define extra information like 'group name' for devices.

Bearing in mind what you said about scheduling, and if we skip 'group' for a 
moment, then can I suggest (or possibly restate, because your comments are 
pointing in this direction):
- we allow extra information to be added at what is now the whitelisting stage, 
that just gets carried around with the device
[yjiang5] For 'added at ... whitelisting stage', see my above statement about 
the configuration. However, if you do want to use whitelist, I'm ok, but please 
keep in mind that it's two functionality combined: device you may assign *and* 
the group name for these devices.

- when we're turning devices into flavors, we can also match on that extra 
information if we want (which means we can tag up the devices on the compute 
node if we like, according to taste, and then bundle them up by tag to make 
flavors; or we can add Neutron specific information and ignore it when making 
flavors)
[yjiang5] Agree. Currently we can only use vendor_id and device_id for 
flavor/alias, but we can extend it to cover such extra information since now 
it's a API.

- we would need to add a config param on the control host to decide which flags 
to group on when doing the stats (and they would additionally be the only 
params that would work for flavors, I think)
[yjiang5] Agree. And this is achievable because we switch the flavor to be API, 
then we can control the flavor creation process.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong
I have to use [yjiang5_1] prefix now :)

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 3:55 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

On 11 January 2014 00:04, Jiang, Yunhong 
yunhong.ji...@intel.commailto:yunhong.ji...@intel.com wrote:
[yjiang5] Really thanks for the summary and it is quite clear. So what's the 
object of equivalent devices at host level? Because 'equivalent device * to 
an end user * is flavor, so is it 'equivalent to *scheduler* or 'equivalent 
to *xxx*'? If equivalent to scheduler, then I'd take the pci_stats as a 
flexible group for scheduler

To the scheduler, indeed.  And with the group proposal the scheduler and end 
user equivalences are one and the same.
[yjiang5_1] Once use the proposal, then we missed the flexible for 'end user 
equivalences and that's the reason I'm against the group :)


Secondly, for your definition of 'whitelist', I'm hesitate to your '*and*' 
because IMHO, 'and' means mixed two things together, otherwise, we can state in 
simply one sentence. For example, I prefer to have another configuration option 
to define the 'put devices in the group', or, if we extend it , be define 
extra information like 'group name' for devices.

I'm not stating what we should do, or what the definitions should mean; I'm 
saying how they've been interpreted as weve discussed this in the past.  We've 
had issues in the past where we've had continuing difficulties in describing 
anything without coming back to a 'whitelist' (generally meaning 'matching 
expression, as an actual 'whitelist' is implied, rather than separately 
required, in a grouping system.
 Bearing in mind what you said about scheduling, and if we skip 'group' for a 
moment, then can I suggest (or possibly restate, because your comments are 
pointing in this direction):
- we allow extra information to be added at what is now the whitelisting stage, 
that just gets carried around with the device
[yjiang5] For 'added at ... whitelisting stage', see my above statement about 
the configuration. However, if you do want to use whitelist, I'm ok, but please 
keep in mind that it's two functionality combined: device you may assign *and* 
the group name for these devices.

Indeed - which is in fact what we've been proposing all along.


- when we're turning devices into flavors, we can also match on that extra 
information if we want (which means we can tag up the devices on the compute 
node if we like, according to taste, and then bundle them up by tag to make 
flavors; or we can add Neutron specific information and ignore it when making 
flavors)
[yjiang5] Agree. Currently we can only use vendor_id and device_id for 
flavor/alias, but we can extend it to cover such extra information since now 
it's a API.

- we would need to add a config param on the control host to decide which flags 
to group on when doing the stats (and they would additionally be the only 
params that would work for flavors, I think)
[yjiang5] Agree. And this is achievable because we switch the flavor to be API, 
then we can control the flavor creation process.

OK - so if this is good then I think the question is how we could change the 
'pci_whitelist' parameter we have - which, as you say, should either *only* do 
whitelisting or be renamed - to allow us to add information.  Yongli has 
something along those lines but it's not flexible and it distinguishes poorly 
between which bits are extra information and which bits are matching 
expressions (and it's still called pci_whitelist) - but even with those 
criticisms it's very close to what we're talking about.  When we have that I 
think a lot of the rest of the arguments should simply resolve themselves.

[yjiang5_1] The reason that not easy to find a flexible/distinguishable change 
to pci_whitelist is because it combined two things. So a stupid/naive solution 
in my head is, change it to VERY generic name, 'pci_devices_information', and 
change schema as an array of {'devices_property'=regex exp, 'group_name' = 
'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
more into the pci_devices_information in future, like 'network_information' = 
xxx or Neutron specific information you required in previous mail. All keys 
other than 'device_property' becomes extra information, i.e. software defined 
property. These extra information will be carried with the PCI devices,. Some 
implementation details, A)we can limit the acceptable keys, like we only 
support 'group_name', 'network_id', or we can accept any keys other than 
reserved (vendor_id, device_id etc) one. B) if a device match 'device_property' 
in several entries, raise exception, or use the first one.

[yjiang5_1] Another thing need discussed is, as you pointed out, we would need 
to add a config

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Robert Li (baoli)
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
vendor_id:,product_id:}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={vendor_id:, product_id:, name:str}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:pci_passthrough:alias=name:count

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ vendor_id:,product_id:, 
name:str}]

By doing so, we eliminated the PCI alias. And we call the name in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the —nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is connected to a physical switch, or a nic that 
is connected to a physical switch, but has a host macvtap device in between. 
The actual names of the choices are not important here, and can be debated.

I'm hoping that we can go over the above on Monday. But any comments are 
welcome by email.

Thanks,
Robert

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Ian Wells
I think I'm in agreement with all of this.  Nice summary, Robert.

It may not be where the work ends, but if we could get this done the rest
is just refinement.


On 9 January 2014 17:49, Robert Li (baoli) ba...@cisco.com wrote:

Hi Folks,

  With John joining the IRC, so far, we had a couple of productive
 meetings in an effort to come to consensus and move forward. Thanks John
 for doing that, and I appreciate everyone's effort to make it to the daily
 meeting. Let's reconvene on Monday.

  But before that, and based on our today's conversation on IRC, I'd like
 to say a few things. I think that first of all, we need to get agreement on
 the terminologies that we are using so far. With the current nova PCI
 passthrough

  PCI whitelist: defines all the available PCI passthrough devices
 on a compute node. pci_passthrough_whitelist=[{
 vendor_id:,product_id:}]
 PCI Alias: criteria defined on the controller node with which
 requested PCI passthrough devices can be selected from all the PCI
 passthrough devices available in a cloud.
 Currently it has the following format: 
 pci_alias={vendor_id:,
 product_id:, name:str}

 nova flavor extra_specs: request for PCI passthrough devices can
 be specified with extra_specs in the format for example:
 pci_passthrough:alias=name:count

  As you can see, currently a PCI alias has a name and is defined on the
 controller. The implications for it is that when matching it against the
 PCI devices, it has to match the vendor_id and product_id against all the
 available PCI devices until one is found. The name is only used for
 reference in the extra_specs. On the other hand, the whitelist is basically
 the same as the alias without a name.

  What we have discussed so far is based on something called PCI groups
 (or PCI flavors as Yongli puts it). Without introducing other complexities,
 and with a little change of the above representation, we will have
 something like:

 pci_passthrough_whitelist=[{ vendor_id:,product_id:,
 name:str}]

  By doing so, we eliminated the PCI alias. And we call the name in
 above as a PCI group name. You can think of it as combining the definitions
 of the existing whitelist and PCI alias. And believe it or not, a PCI group
 is actually a PCI alias. However, with that change of thinking, a lot of
 benefits can be harvested:

   * the implementation is significantly simplified
  * provisioning is simplified by eliminating the PCI alias
  * a compute node only needs to report stats with something like:
 PCI group name:count. A compute node processes all the PCI passthrough
 devices against the whitelist, and assign a PCI group based on the
 whitelist definition.
  * on the controller, we may only need to define the PCI group
 names. if we use a nova api to define PCI groups (could be private or
 public, for example), one potential benefit, among other things
 (validation, etc),  they can be owned by the tenant that creates them. And
 thus a wholesale of PCI passthrough devices is also possible.
  * scheduler only works with PCI group names.
  * request for PCI passthrough device is based on PCI-group
  * deployers can provision the cloud based on the PCI groups
  * Particularly for SRIOV, deployers can design SRIOV PCI groups
 based on network connectivities.

  Further, to support SRIOV, we are saying that PCI group names not only
 can be used in the extra specs, it can also be used in the —nic option and
 the neutron commands. This allows the most flexibilities and
 functionalities afforded by SRIOV.

  Further, we are saying that we can define default PCI groups based on
 the PCI device's class.

  For vnic-type (or nic-type), we are saying that it defines the link
 characteristics of the nic that is attached to a VM: a nic that's connected
 to a virtual switch, a nic that is connected to a physical switch, or a nic
 that is connected to a physical switch, but has a host macvtap device in
 between. The actual names of the choices are not important here, and can be
 debated.

  I'm hoping that we can go over the above on Monday. But any comments are
 welcome by email.

  Thanks,
 Robert


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Brian Schott
Ian,

The idea of pci flavors is a great and using vendor_id and product_id make 
sense, but I could see a case for adding the class name such as 'VGA compatible 
controller'. Otherwise, slightly different generations of hardware will mean 
custom whitelist setups on each compute node.  

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)

On the flip side, vendor_id and product_id might not be sufficient.  Suppose I 
have two identical NICs, one for nova internal use and the second for guest 
tenants?  So, bus numbering may be required.  

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)

Some possible combinations:

# take 2 gpus
pci_passthrough_whitelist=[
{ vendor_id:NVIDIA Corporation G71,product_id:GeForce 7900 GTX, 
name:GPU},
]

# only take the GPU on PCI 2
pci_passthrough_whitelist=[
{ vendor_id:NVIDIA Corporation G71,product_id:GeForce 7900 GTX, 
'bus_id': '02:', name:GPU},
]
pci_passthrough_whitelist=[
{bus_id: 01:00.0, name: GPU},
{bus_id: 02:00.0, name: GPU},
]

pci_passthrough_whitelist=[
{class: VGA compatible controller, name: GPU},
]

pci_passthrough_whitelist=[
{ product_id:GeForce 7900 GTX, name:GPU},
]

I know you guys are thinking of PCI devices, but any though of mapping to 
something like udev rather than pci?  Supporting udev rules might be easier and 
more robust rather than making something up.

Brian

-
Brian Schott, CTO
Nimbis Services, Inc.
brian.sch...@nimbisservices.com
ph: 443-274-6064  fx: 443-274-6060



On Jan 9, 2014, at 12:47 PM, Ian Wells ijw.ubu...@cack.org.uk wrote:

 I think I'm in agreement with all of this.  Nice summary, Robert.
 
 It may not be where the work ends, but if we could get this done the rest is 
 just refinement.
 
 
 On 9 January 2014 17:49, Robert Li (baoli) ba...@cisco.com wrote:
 Hi Folks,
 
 
 With John joining the IRC, so far, we had a couple of productive meetings in 
 an effort to come to consensus and move forward. Thanks John for doing that, 
 and I appreciate everyone's effort to make it to the daily meeting. Let's 
 reconvene on Monday. 
 
 But before that, and based on our today's conversation on IRC, I'd like to 
 say a few things. I think that first of all, we need to get agreement on the 
 terminologies that we are using so far. With the current nova PCI passthrough
 
 PCI whitelist: defines all the available PCI passthrough devices on a 
 compute node. pci_passthrough_whitelist=[{
  vendor_id:,product_id:}] 
 PCI Alias: criteria defined on the controller node with which 
 requested PCI passthrough devices can be selected from all the PCI 
 passthrough devices available in a cloud. 
 Currently it has the following format: 
 pci_alias={vendor_id:,
  product_id:, name:str}
 
 nova flavor extra_specs: request for PCI passthrough devices can be 
 specified with extra_specs in the format for 
 example:pci_passthrough:alias=name:count
 
 As you can see, currently a PCI alias has a name and is defined on the 
 controller. The implications for it is that when matching it against the PCI 
 devices, it has to match the vendor_id and product_id against all the 
 available PCI devices until one is found. The name is only used for reference 
 in the extra_specs. On the other hand, the whitelist is basically the same as 
 the alias without a name.
 
 What we have discussed so far is based on something called PCI groups (or PCI 
 flavors as Yongli puts it). Without introducing other complexities, and with 
 a little change of the above representation, we will have something like:
 
 pci_passthrough_whitelist=[{ vendor_id:,product_id:,
  name:str}] 
 
 By doing so, we eliminated the PCI alias. And we call the name in above as 
 a PCI group name. You can think of it as combining the definitions of the 
 existing whitelist and PCI alias. And believe it or not, a PCI group is 
 actually a PCI alias. However, with that change of thinking, a lot of 
 benefits can be harvested:
 
  * the implementation is significantly simplified
  * provisioning is simplified by eliminating the PCI alias
  * a compute node only needs to report stats with something like: PCI 
 group name:count. A compute node processes all the PCI passthrough devices 
 against the whitelist, and assign a PCI group based on the whitelist 
 definition.
  * on the controller, we may only need to define the PCI group names. 
 if we use a nova api to define PCI groups (could be private or public, for 
 example), one potential benefit, among other things (validation, etc),  they 
 can be owned by the tenant that creates them. And thus a wholesale of PCI 
 passthrough devices is also possible.
  * scheduler only works with 

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Sandhya Dasu (sadasu)
Hi,
 One use case was brought up in today's meeting that I think is not valid.

It is the use case where all 3 vnic types : Virtio, direct and macvtap (the 
terms used in the meeting were slow, fast, faster/foobar) could be attached to 
the same VM.  The main difference between a direct and macvtap interface is 
that the former does not support live migration. So, attaching both direct and 
macvtap pci-passthrough interfaces to the same VM would mean that it cannot 
support live migration. In that case assigning the macvtap interface is in 
essence a waste.

So, it would be ideal to disallow such an assignment or at least warn the user 
that the VM will now not be able to support live migration.  We can  however 
still combine direct or macvtap pci-passthrough interfaces with virtio vmic 
types without issue.

Thanks,
Sandhya

From: Ian Wells ijw.ubu...@cack.org.ukmailto:ijw.ubu...@cack.org.uk
Reply-To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Date: Thursday, January 9, 2014 12:47 PM
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

I think I'm in agreement with all of this.  Nice summary, Robert.

It may not be where the work ends, but if we could get this done the rest is 
just refinement.


On 9 January 2014 17:49, Robert Li (baoli) 
ba...@cisco.commailto:ba...@cisco.com wrote:
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
vendor_id:,product_id:}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={vendor_id:, product_id:, name:str}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:pci_passthrough:alias=name:count

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ vendor_id:,product_id:, 
name:str}]

By doing so, we eliminated the PCI alias. And we call the name in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the —nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Ian Wells
On 9 January 2014 20:19, Brian Schott brian.sch...@nimbisservices.comwrote:

 Ian,

 The idea of pci flavors is a great and using vendor_id and product_id make
 sense, but I could see a case for adding the class name such as 'VGA
 compatible controller'. Otherwise, slightly different generations of
 hardware will mean custom whitelist setups on each compute node.


Personally, I think the important thing is to have a matching expression.
The more flexible the matching language, the better.

On the flip side, vendor_id and product_id might not be sufficient.
  Suppose I have two identical NICs, one for nova internal use and the
 second for guest tenants?  So, bus numbering may be required.

 01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
 GTX] (rev a1)
 02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
 GTX] (rev a1)


I totally concur on this - with network devices in particular the PCI path
is important because you don't accidentally want to grab the Openstack
control network device ;)


 I know you guys are thinking of PCI devices, but any though of mapping to
 something like udev rather than pci?  Supporting udev rules might be easier
 and more robust rather than making something up.


Past experience has told me that udev rules are not actually terribly good,
which you soon discover when you have to write expressions like:

 SUBSYSTEM==net, KERNELS==:83:00.0, ACTION==add, NAME=eth8

which took me a long time to figure out and is self-documenting only in
that it has a recognisable PCI path in there, 'KERNELS' not being a
meaningful name to me.  And self-documenting is key to udev rules, because
there's not much information on the tag meanings otherwise.

I'm comfortable with having a match format that covers what we know and
copes with extension for when we find we're short a feature, and what we
have now is close to that.  Yes, it needs the class adding, we all agree,
and you should be able to match on PCI path, which you can't now, but it's
close.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Ian Wells
On 9 January 2014 22:50, Ian Wells ijw.ubu...@cack.org.uk wrote:

 On 9 January 2014 20:19, Brian Schott brian.sch...@nimbisservices.comwrote:
 On the flip side, vendor_id and product_id might not be sufficient.
  Suppose I have two identical NICs, one for nova internal use and the
 second for guest tenants?  So, bus numbering may be required.


 01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
 GTX] (rev a1)
 02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
 GTX] (rev a1)


 I totally concur on this - with network devices in particular the PCI path
 is important because you don't accidentally want to grab the Openstack
 control network device ;)


Redundant statement is redundant.  Sorry, yes, this has been a pet bugbear
of mine.  It applies equally to provider networks on the networking side of
thing, and, where Neutron is not your network device manager for a PCI
device, you may want several device groups bridged to different segments.
Network devices are one case of a category of device where there's
something about the device that you can't detect that means it's not
necessarily interchangeable with its peers.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Jiang, Yunhong
Robert, sorry that I'm not fan of * your group * term. To me, *your group 
mixed two thing. It's an extra property provided by configuration, and also 
it's a very-not-flexible mechanism to select devices (you can only select 
devices based on the 'group name' property).


1)   A dynamic group is much better. For example, user may want to select 
GPU device based on vendor id, or based on vendor_id+device_id. In another 
word, user want to create group based on vendor_id, or vendor_id+device_id and 
select devices from these group.  John's proposal is very good, to provide an 
API to create the PCI flavor(or alias). I prefer flavor because it's more 
openstack style.



2)   As for the second thing of your 'group', I'd understand it as an extra 
property provided by configuration.  I don't think we should put it into the 
white list, which is to configure devices that are assignable.  I'd add another 
configuration option to provide extra attribute to devices. When nova compute 
is up, it will parse this configuration and add them to the corresponding PCI 
devices. I don't think adding another configuration will cause too many trouble 
to deployment. Openstack already have a lot of configuration items :)



3)   I think currently we mixed the neutron and nova design. To me, Neutron 
SRIOV support is a user of nova PCI support. Thus we should firstly analysis 
the requirement from neutron PCI support to nova PCI support in a more generic  
way, and then, we can discuss how we enhance the nova PCI support, or, if you 
want, re-design the nova PCI support. IMHO, if don't consider network, current 
implementation should be ok.



4)   IMHO, the core for nova PCI support is *PCI property*. The property 
means not only generic PCI devices like vendor id, device id, device type, 
compute specific property like BDF address, the adjacent switch IP address,  
but also user defined property like nuertron's physical net name etc. And then, 
it's about how to get these property, how to select/group devices based on the 
property, how to store/fetch these properties.



Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Thursday, January 09, 2014 8:49 AM
To: OpenStack Development Mailing List (not for usage questions); Irena 
Berezovsky; Sandhya Dasu (sadasu); Jiang, Yunhong; Itzik Brown; 
j...@johngarbutt.com; He, Yongli
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
vendor_id:,product_id:}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={vendor_id:, product_id:, name:str}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:pci_passthrough:alias=name:count

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ vendor_id:,product_id:, 
name:str}]

By doing so, we eliminated the PCI alias. And we call the name in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Jiang, Yunhong
Hi, Ian, when you in aggrement with all of this, do you agree with the 'group 
name', or agree with John's pci flavor?
I'm against the PCI group and will send out a reply later.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 09, 2014 9:47 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
ba...@cisco.commailto:ba...@cisco.com wrote:
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
vendor_id:,product_id:}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={vendor_id:, product_id:, name:str}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:pci_passthrough:alias=name:count

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ vendor_id:,product_id:, 
name:str}]

By doing so, we eliminated the PCI alias. And we call the name in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the -nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is connected to a physical switch, or a nic that 
is connected to a physical switch, but has a host macvtap device in between. 
The actual names of the choices are not important here, and can be debated.

I'm hoping that we can go over the above on Monday. But any comments are 
welcome by email.

Thanks,
Robert


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Jiang, Yunhong
BTW, I like the PCI flavor :)

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Thursday, January 09, 2014 10:41 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi, Ian, when you in aggrement with all of this, do you agree with the 'group 
name', or agree with John's pci flavor?
I'm against the PCI group and will send out a reply later.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 09, 2014 9:47 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
ba...@cisco.commailto:ba...@cisco.com wrote:
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
vendor_id:,product_id:}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={vendor_id:, product_id:, name:str}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:pci_passthrough:alias=name:count

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ vendor_id:,product_id:, 
name:str}]

By doing so, we eliminated the PCI alias. And we call the name in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the -nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is connected to a physical switch, or a nic that 
is connected to a physical switch, but has a host macvtap device in between. 
The actual names of the choices are not important here, and can be debated.

I'm hoping that we can go over the above on Monday. But any comments are 
welcome by email.

Thanks,
Robert


___
OpenStack-dev mailing

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread yongli he

On 2014?01?10? 00:49, Robert Li (baoli) wrote:


Hi Folks,


HI, all

basiclly i flavor  the pic-flavor style and against massing  the 
white-list. please see my inline comments.





With John joining the IRC, so far, we had a couple of productive 
meetings in an effort to come to consensus and move forward. Thanks 
John for doing that, and I appreciate everyone's effort to make it to 
the daily meeting. Let's reconvene on Monday.


But before that, and based on our today's conversation on IRC, I'd 
like to say a few things. I think that first of all, we need to get 
agreement on the terminologies that we are using so far. With the 
current nova PCI passthrough


PCI whitelist: defines all the available PCI passthrough 
devices on a compute node. pci_passthrough_whitelist=[{ 
vendor_id:,product_id:}]
PCI Alias: criteria defined on the controller node with which 
requested PCI passthrough devices can be selected from all the PCI 
passthrough devices available in a cloud.
Currently it has the following format: 
pci_alias={vendor_id:, product_id:, name:str}
nova flavor extra_specs: request for PCI passthrough devices 
can be specified with extra_specs in the format for 
example:pci_passthrough:alias=name:count


As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against 
the PCI devices, it has to match the vendor_id and product_id against 
all the available PCI devices until one is found. The name is only 
used for reference in the extra_specs. On the other hand, the 
whitelist is basically the same as the alias without a name.


What we have discussed so far is based on something called PCI groups 
(or PCI flavors as Yongli puts it). Without introducing other 
complexities, and with a little change of the above representation, we 
will have something like:
pci_passthrough_whitelist=[{ vendor_id:,product_id:, 
name:str}]


By doing so, we eliminated the PCI alias. And we call the name in 
above as a PCI group name. You can think of it as combining the 
definitions of the existing whitelist and PCI alias. And believe it or 
not, a PCI group is actually a PCI alias. However, with that change of 
thinking, a lot of
the white list configuration is mostly local to a host, so only address 
in there, like John's proposal is good. mix the group into the whitelist 
means we make the global thing per host style, this is maybe wrong.



benefits can be harvested:

 * the implementation is significantly simplified

but more mass, refer my new patches already sent out.

 * provisioning is simplified by eliminating the PCI alias
pci alias provide a good way to define a global reference-able name for 
PCI, we need this, this is also true for John's pci-flavor.
 * a compute node only needs to report stats with something 
like: PCI group name:count. A compute node processes all the PCI 
passthrough devices against the whitelist, and assign a PCI group 
based on the whitelist definition.
simplify this seems like good, but it does not, separated the local and 
global is the instinct nature simplify.
 * on the controller, we may only need to define the PCI group 
names. if we use a nova api to define PCI groups (could be private or 
public, for example), one potential benefit, among other things 
(validation, etc),  they can be owned by the tenant that creates them. 
And thus a wholesale of PCI passthrough devices is also possible.
this mean you should consult the controller to deploy your host, if we 
keep white-list local, we simplify the deploy.

 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI 
groups based on network connectivities.


Further, to support SRIOV, we are saying that PCI group names not only 
can be used in the extra specs, it can also be used in the —nic option 
and the neutron commands. This allows the most flexibilities and 
functionalities afforded by SRIOV.

i still feel use alias/pci flavor is better solution.


Further, we are saying that we can define default PCI groups based on 
the PCI device's class.
default grouping make our conceptual model more mass, pre-define a 
global thing in API and your hard code is not good way, i post -2 for this.


For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's 
connected to a virtual switch, a nic that is connected to a physical 
switch, or a nic that is connected to a physical switch, but has a 
host macvtap device in between. The actual names of the choices are 
not important here, and can be debated.


I'm hoping that we can go over the above on Monday. But any comments 
are welcome by email.



Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-02 Thread John Garbutt
On 22 December 2013 12:07, Irena Berezovsky ire...@mellanox.com wrote:
 Hi Ian,

 My comments are inline

 I  would like to suggest to focus the next PCI-pass though IRC meeting on:

 1.Closing the administration and tenant that powers the VM use
 cases.

 2.   Decouple the nova and neutron parts to start focusing on the
 neutron related details.

When is the next meeting?

I have lost track due to holidays, etc.

John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-02 Thread Robert Li (baoli)
Hi John,

We had one on 12/14/2013 with the log:

http://eavesdrop.openstack.org/meetings/pci_passthrough_meeting/2013/pci_pa
ssthrough_meeting.2013-12-24-14.02.log.html

The next one will be at UTC 1400 on Jan. 7th, Tuesday.


--Robert

On 1/2/14 10:06 AM, John Garbutt j...@johngarbutt.com wrote:

On 22 December 2013 12:07, Irena Berezovsky ire...@mellanox.com wrote:
 Hi Ian,

 My comments are inline

 I  would like to suggest to focus the next PCI-pass though IRC meeting
on:

 1.Closing the administration and tenant that powers the VM use
 cases.

 2.   Decouple the nova and neutron parts to start focusing on the
 neutron related details.

When is the next meeting?

I have lost track due to holidays, etc.

John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-23 Thread Jose Gavine Cueto
Hi,

I would just like to share my idea on somehow managing sr-iov networking
attributes in neutron (e.g. mac addr, ip addr, vlan).  I've had experience
implementing this and that was before pci-passthrough feature in nova
existed.  Basically, nova still did the plugging and the unplugging of vifs
and neutron did all the provisioning of networking attributes.  At that
time, the best hack I can do was to treat sr-iov nics as ordinary vifs that
were distinguishable by nova and neutron.  So to implement that, when
booting an instance in nova, a certain sr-iov-vf-specific extra_spec was
used (e.g. vfs := 1) indicating the number of sr-iov vfs to create and
eventually represented as mere vif objects in nova.  In nova, the sr-iov
vfs were represented as vifs but a special exception was made wherein
sr-iov vfs aren't really plugged, because of course it isn't necessary.  In
effect, the vifs that represent the vfs were accounted in the db including
its ip and mac addresses, and vlan tags.  With respect to l2 isolation, the
vlan tags were retrieved when booting the instance through neutron api and
were applied in libvirt xml.  To summarize, the networking attributes such
as ip and mac addresses and vlan tags were applied normally to vfs and thus
preserved the normal OS way of managing these like ordinary vifs.
 However, since its just a hack, some consequences and issues surfaced such
as, proper migration of these networking attributes weren't tested,
 libvirt seems to mistakenly swap the mac addresses when rebooting the
instances, and most importantly the vifs that represented the vfs lack
passthrough-specific information.  Since today OS already has this concept
of PCI-passthrough, I'm thinking this could be combined with the idea of a
vf that is represented by a vif to have a complete abstraction of a
manageable sr-iov vf.  I have not read thoroughly the preceeding replies,
so this idea might be redundant or irrelevant already.

Cheers,
Pepe


On Thu, Oct 17, 2013 at 4:32 AM, Irena Berezovsky ire...@mellanox.comwrote:

  Hi,

 As one of the next steps for PCI pass-through I would like to discuss is
 the support for PCI pass-through vNIC.

 While nova takes care of PCI pass-through device resources  management and
 VIF settings, neutron should manage their networking configuration.

 I would like to register a summit proposal to discuss the support for PCI
 pass-through networking.

 I am not sure what would be the right topic to discuss the PCI
 pass-through networking, since it involve both nova and neutron.

 There is already a session registered by Yongli on nova topic to discuss
 the PCI pass-through next steps.

 I think PCI pass-through networking is quite a big topic and it worth to
 have a separate discussion.

 Is there any other people who are interested to discuss it and share their
 thoughts and experience?



 Regards,

 Irena



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
To stop learning is like to stop loving.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-23 Thread Jay Pipes

On 12/17/2013 10:09 AM, Ian Wells wrote:

Reiterating from the IRC mneeting, largely, so apologies.

Firstly, I disagree that
https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support is an
accurate reflection of the current state.  It's a very unilateral view,
largely because the rest of us had been focussing on the google document
that we've been using for weeks.

Secondly, I totally disagree with this approach.  This assumes that
description of the (cloud-internal, hardware) details of each compute
node is best done with data stored centrally and driven by an API.  I
don't agree with either of these points.

Firstly, the best place to describe what's available on a compute node
is in the configuration on the compute node.  For instance, I describe
which interfaces do what in Neutron on the compute node.  This is
because when you're provisioning nodes, that's the moment you know how
you've attached it to the network and what hardware you've put in it and
what you intend the hardware to be for - or conversely your deployment
puppet or chef or whatever knows it, and Razor or MAAS has enumerated
it, but the activities are equivalent.  Storing it centrally distances
the compute node from its descriptive information for no good purpose
that I can see and adds the complexity of having to go make remote
requests just to start up.

Secondly, even if you did store this centrally, it's not clear to me
that an API is very useful.  As far as I can see, the need for an API is
really the need to manage PCI device flavors.  If you want that to be
API-managed, then the rest of a (rather complex) API cascades from that
one choice.  Most of the things that API lets you change (expressions
describing PCI devices) are the sort of thing that you set once and only
revisit when you start - for instance - deploying new hosts in a
different way.

I at the parallel in Neutron provider networks.  They're config driven,
largely on the compute hosts.  Agents know what ports on their machine
(the hardware tie) are associated with provider networks, by provider
network name.  The controller takes 'neutron net-create ...
--provider:network 'name'' and uses that to tie a virtual network to the
provider network definition on each host.  What we absolutely don't do
is have a complex admin API that lets us say 'in host aggregate 4,
provider network x (which I made earlier) is connected to eth6'.


FWIW, I could not agree more. The Neutron API already suffers from 
overcomplexity. There's really no need to make it even more complex than 
it already is, especially for a feature that more naturally fits in 
configuration data (Puppet/Chef/etc) and isn't something that you would 
really ever change for a compute host once set.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-22 Thread Irena Berezovsky
Hi Ian,
My comments are inline
I  would like to suggest to focus the next PCI-pass though IRC meeting on:

1.Closing the administration and tenant that powers the VM use cases.

2.   Decouple the nova and neutron parts to start focusing on the neutron 
related details.

BR,
Irena

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, December 20, 2013 2:50 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

On 19 December 2013 15:15, John Garbutt 
j...@johngarbutt.commailto:j...@johngarbutt.com wrote:
 Note, I don't see the person who boots the server ever seeing the pci-flavor, 
 only understanding the server flavor.
 [IrenaB] I am not sure that elaborating PCI device request into server flavor 
 is the right approach for the PCI pass-through network case. vNIC by its 
 nature is something dynamic that can be plugged or unplugged after VM boot. 
 server flavor is  quite static.
I was really just meaning the server flavor specify the type of NIC to attach.

The existing port specs, etc, define how many nics, and you can hot
plug as normal, just the VIF plugger code is told by the server flavor
if it is able to PCI passthrough, and which devices it can pick from.
The idea being combined with the neturon network-id you know what to
plug.

The more I talk about this approach the more I hate it :(

The thinking we had here is that nova would provide a VIF or a physical NIC for 
each attachment.  Precisely what goes on here is a bit up for grabs, but I 
would think:
Nova specifiies the type at port-update, making it obvious to Neutron it's 
getting a virtual interface or a passthrough NIC (and the type of that NIC, 
probably, and likely also the path so that Neutron can distinguish between NICs 
if it needs to know the specific attachment port)
Neutron does its magic on the network if it has any to do, like faffing(*) with 
switches
Neutron selects the VIF/NIC plugging type that Nova should use, and in the case 
that the NIC is a VF and it wants to set an encap, returns that encap back to 
Nova
Nova plugs it in and sets it up (in libvirt, this is generally in the XML; 
XenAPI and others are up for grabs).
[IrenaB] I agree on the described flow. Still need to close how to elaborate 
the request for pass-through vNIC into the  'nova boot'.
 We might also want a nic-flavor that tells neutron information it requires, 
 but lets get to that later...
 [IrenaB] nic flavor is definitely something that we need in order to choose 
 if  high performance (PCI pass-through) or virtio (i.e. OVS) nic will be 
 created.
Well, I think its the right way go. Rather than overloading the server
flavor with hints about which PCI devices you could use.

The issue here is that additional attach.  Since for passthrough that isn't 
NICs (like crypto cards) you would almost certainly specify it in the flavor, 
if you did the same for NICs then you would have a preallocated pool of NICs 
from which to draw.  The flavor is also all you need to know for billing, and 
the flavor lets you schedule.  If you have it on the list of NICs, you have to 
work out how many physical NICs you need before you schedule (admittedly not 
hard, but not in keeping) and if you then did a subsequent attach it could fail 
because you have no more NICs on the machine you scheduled to - and at this 
point you're kind of stuck.

Also with the former, if you've run out of NICs, the already-extant resize call 
would allow you to pick a flavor with more NICs and you can then reschedule the 
subsequent VM to wherever resources are available to fulfil the new request.
[IrenaB] Still think that putting PCI NIC request into Server Flavor is not 
right approach. You will need to create different server flavors per any 
possible combination of tenant networks attachment options, or maybe assume he 
is connecting to all. As for billing, you can use type of vNIC in addition to 
packets in/out for billing per vNIC. This way, tenant will be charged only for  
used vNICs.
One question here is whether Neutron should become a provider of billed 
resources (specifically passthrough NICs) in the same way as Cinder is of 
volumes - something we'd not discussed to date; we've largely worked on the 
assumption that NICs are like any other passthrough resource, just one where, 
once it's allocated out, Neutron can work magic with it.
[IrenaB] I am not so familiar with Ceilometer, but seems that if we are talking 
about network resources, neutron should be in charge.

--
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread John Garbutt
Apologies for being late onto this thread, and not making the meeting
the other day.
Also apologies this is almost totally a top post.

On 17 December 2013 15:09, Ian Wells ijw.ubu...@cack.org.uk wrote:
 Firstly, I disagree that
 https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support is an accurate
 reflection of the current state.  It's a very unilateral view, largely
 because the rest of us had been focussing on the google document that we've
 been using for weeks.

I haven't seen the google doc. I got involved through the blueprint
review of this:
https://blueprints.launchpad.net/nova/+spec/pci-extra-info

I assume its this one?
https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs

On a quick read, my main concern is separating out the user more:
* administration (defines pci-flavor, defines which hosts can provide
it, defines server flavor...)
* person who boots server (picks server flavor, defines neutron ports)

Note, I don't see the person who boots the server ever seeing the
pci-flavor, only understanding the server flavor.

We might also want a nic-flavor that tells neutron information it
requires, but lets get to that later...

 Secondly, I totally disagree with this approach.  This assumes that
 description of the (cloud-internal, hardware) details of each compute node
 is best done with data stored centrally and driven by an API.  I don't agree
 with either of these points.

Possibly, but I would like to first agree on the use cases and data
model we want.

Nova has generally gone for APIs over config in recent times.
Mostly so you can do run-time configuration of the system.
But lets just see what makes sense when we have the use cases agreed.

 On 2013年12月16日 22:27, Robert Li (baoli) wrote:
 I'd like to give you guy a summary of current state, let's discuss it
 then.
 https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support


 1)  fade out alias ( i think this ok for all)
 2)  white list became pic-flavor ( i think this ok for all)
 3)  address simply regular expression support: only * and a number range
 is support [hex-hex]. ( i think this ok?)
 4)  aggregate : now it's clear enough, and won't impact SRIOV.  ( i think
 this irrelevant to SRIOV now)

So... this means we have:

PCI-flavor:
* i.e. standardGPU, standardGPUnew, fastGPU, hdFlash1TB etc

Host mapping:
* decide which hosts you allow a particular flavor to be used
* note, the scheduler still needs to find out if any devices are free

flavor (of the server):
* usual RAM, CPU, Storage
* use extra specs to add PCI devices
* example:
** add one PCI device, choice of standardGPU or standardGPUnew
** also add: one hdFlash1TB

Now, the other bit is SRIOV... At a high level:

Neutron:
* user wants to connect to a particular neutron network
* user wants a super-fast SRIOV connection

Administration:
* needs to map PCI device to what neutron network the connect to

The big question is:
* is this a specific SRIOV only (provider) network
* OR... are other non-SRIOV connections also made to that same network

I feel we have to go for that latter. Imagine a network on VLAN 42,
you might want some SRIOV into that network, and some OVS connecting
into the same network. The user might have VMs connected using both
methods, so wants the same IP address ranges and same network id
spanning both.

If we go for that latter new either need:
* some kind of nic-flavor
** boot ... -nic nic-id:public-id:,nic-flavor:10GBpassthrough
** but neutron could store nic-flavor, and pass it through to VIF
driver, and user says port-id
* OR add NIC config into the server flavor
** extra spec to say, tell VIF driver it could use on of this list of
PCI devices: (list pci-flavors)
* OR do both

I vote for nic-flavor only, because it matches the volume-type we have
with cinder.

However, it does suggest that Nova should leave all the SRIOV work to
the VIF driver.
So the VIF driver, as activate by neutron, will understand which PCI
devices to passthrough.

Similar to the plan with brick, we could have an oslo lib that helps
you attach SRIOV devices that could be used by the neturon VIF drivers
and the nova PCI passthrough code.

Thanks,
John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread Ian Wells
John:

 At a high level:

 Neutron:
 * user wants to connect to a particular neutron network
 * user wants a super-fast SRIOV connection

Administration:
 * needs to map PCI device to what neutron network the connect to

The big question is:
 * is this a specific SRIOV only (provider) network
 * OR... are other non-SRIOV connections also made to that same network

 I feel we have to go for that latter. Imagine a network on VLAN 42,
 you might want some SRIOV into that network, and some OVS connecting
 into the same network. The user might have VMs connected using both
 methods, so wants the same IP address ranges and same network id
 spanning both.


 If we go for that latter new either need:
 * some kind of nic-flavor
 ** boot ... -nic nic-id:public-id:,nic-flavor:10GBpassthrough
 ** but neutron could store nic-flavor, and pass it through to VIF
 driver, and user says port-id
 * OR add NIC config into the server flavor
 ** extra spec to say, tell VIF driver it could use on of this list of
 PCI devices: (list pci-flavors)
 * OR do both

 I vote for nic-flavor only, because it matches the volume-type we have
 with cinder.


I think the issue there is that Nova is managing the supply of PCI devices
(which is limited and limited on a per-machine basis).  Indisputably you
need to select the NIC you want to use as a passthrough rather than a vnic
device, so there's something in the --nic argument, but you have to answer
two questions:

- how many devices do you need (which is now not a flavor property but in
the --nic list, which seems to me an odd place to be defining billable
resources)
- what happens when someone does nova interface-attach?

Cinder's an indirect parallel because the resources it's adding to the
hypervisor are virtual and unlimited, I think, or am I missing something
here?


 However, it does suggest that Nova should leave all the SRIOV work to
 the VIF driver.
 So the VIF driver, as activate by neutron, will understand which PCI
 devices to passthrough.

 Similar to the plan with brick, we could have an oslo lib that helps
 you attach SRIOV devices that could be used by the neturon VIF drivers
 and the nova PCI passthrough code.


I'm not clear that this is necessary.

At the moment with vNICs, you pass through devices by having a co-operation
between Neutron (which configures a way of attaching them to put them on a
certain network) and the hypervisor specific code (which creates them in
the instance and attaches them as instructed by Neutron).  Why would we not
follow the same pattern with passthrough devices?  In this instance,
neutron would tell nova that when it's plugging this device it should be a
passthrough device, and pass any additional parameters like the VF encap,
and Nova would do as instructed, then Neutron would reconfigure whatever
parts of the network need to be reconfigured in concert with the
hypervisor's settings to make the NIC a part of the specified network.
-- 
Ian.



 Thanks,
 John

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread John Garbutt
On 19 December 2013 12:21, Ian Wells ijw.ubu...@cack.org.uk wrote:

 John:

 At a high level:

 Neutron:
 * user wants to connect to a particular neutron network
 * user wants a super-fast SRIOV connection

 Administration:
 * needs to map PCI device to what neutron network the connect to

 The big question is:
 * is this a specific SRIOV only (provider) network
 * OR... are other non-SRIOV connections also made to that same network

 I feel we have to go for that latter. Imagine a network on VLAN 42,
 you might want some SRIOV into that network, and some OVS connecting
 into the same network. The user might have VMs connected using both
 methods, so wants the same IP address ranges and same network id
 spanning both.


 If we go for that latter new either need:
 * some kind of nic-flavor
 ** boot ... -nic nic-id:public-id:,nic-flavor:10GBpassthrough
 ** but neutron could store nic-flavor, and pass it through to VIF
 driver, and user says port-id
 * OR add NIC config into the server flavor
 ** extra spec to say, tell VIF driver it could use on of this list of
 PCI devices: (list pci-flavors)
 * OR do both

 I vote for nic-flavor only, because it matches the volume-type we have
 with cinder.


 I think the issue there is that Nova is managing the supply of PCI devices
 (which is limited and limited on a per-machine basis).  Indisputably you
 need to select the NIC you want to use as a passthrough rather than a vnic
 device, so there's something in the --nic argument, but you have to answer
 two questions:

 - how many devices do you need (which is now not a flavor property but in
 the --nic list, which seems to me an odd place to be defining billable
 resources)
 - what happens when someone does nova interface-attach?

Agreed.

The --nic list specifies how many NICs.

I was suggesting adding a nic-flavor on each --nic spec to say if its
PCI passthrough vs virtual NIC.

 Cinder's an indirect parallel because the resources it's adding to the
 hypervisor are virtual and unlimited, I think, or am I missing something
 here?

I was more referring more to the different volume-types i.e. fast
volume or normal volume.
And how that is similar to virtual vs fast PCI passthough vs slow
PCI passthrough

Local volumes probably have the same issues as PCI passthrough with
finite resources.
But I am not sure we have a good solution for that yet.

Mostly, it seems right that Cinder and Neutron own the configuration
about the volume and network resources.

The VIF driver and volume drivers seem to have a similar sort of
relationship with Cinder and Neutron vs Nova.

Then the issues boils down to visibility into that data so we can
schedule efficiently, which is no easy problem.


 However, it does suggest that Nova should leave all the SRIOV work to
 the VIF driver.
 So the VIF driver, as activate by neutron, will understand which PCI
 devices to passthrough.

 Similar to the plan with brick, we could have an oslo lib that helps
 you attach SRIOV devices that could be used by the neturon VIF drivers
 and the nova PCI passthrough code.

 I'm not clear that this is necessary.

 At the moment with vNICs, you pass through devices by having a co-operation
 between Neutron (which configures a way of attaching them to put them on a
 certain network) and the hypervisor specific code (which creates them in the
 instance and attaches them as instructed by Neutron).  Why would we not
 follow the same pattern with passthrough devices?  In this instance, neutron
 would tell nova that when it's plugging this device it should be a
 passthrough device, and pass any additional parameters like the VF encap,
 and Nova would do as instructed, then Neutron would reconfigure whatever
 parts of the network need to be reconfigured in concert with the
 hypervisor's settings to make the NIC a part of the specified network.

I agree, in general terms.

Firstly, do you agree the neutron network-id can be used for
passthrough and non-passthrough VIF connections? i.e. a neturon
network-id does not imply PCI-passthrough.

Secondly, we need to agree on the information flow around defining the
flavor of the NIC. i.e. virtual or passthroughFast or
passthroughNormal.

My gut feeling is that neutron port description should somehow define
this via a nic-flavor that maps to a group of pci-flavors.

But from a billing point of view, I like the idea of the server flavor
saying to the VIF plug code, by the way, for this server, please
support all the nics using devices in pciflavor:fastNic should that be
possible for the users given port configuration. But this is leaking
neutron/networking information into Nova, which seems bad.

John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread John Garbutt
On 19 December 2013 12:54, John Garbutt j...@johngarbutt.com wrote:
 On 19 December 2013 12:21, Ian Wells ijw.ubu...@cack.org.uk wrote:

 John:

 At a high level:

 Neutron:
 * user wants to connect to a particular neutron network
 * user wants a super-fast SRIOV connection

 Administration:
 * needs to map PCI device to what neutron network the connect to

 The big question is:
 * is this a specific SRIOV only (provider) network
 * OR... are other non-SRIOV connections also made to that same network

 I feel we have to go for that latter. Imagine a network on VLAN 42,
 you might want some SRIOV into that network, and some OVS connecting
 into the same network. The user might have VMs connected using both
 methods, so wants the same IP address ranges and same network id
 spanning both.


 If we go for that latter new either need:
 * some kind of nic-flavor
 ** boot ... -nic nic-id:public-id:,nic-flavor:10GBpassthrough
 ** but neutron could store nic-flavor, and pass it through to VIF
 driver, and user says port-id
 * OR add NIC config into the server flavor
 ** extra spec to say, tell VIF driver it could use on of this list of
 PCI devices: (list pci-flavors)
 * OR do both

 I vote for nic-flavor only, because it matches the volume-type we have
 with cinder.


 I think the issue there is that Nova is managing the supply of PCI devices
 (which is limited and limited on a per-machine basis).  Indisputably you
 need to select the NIC you want to use as a passthrough rather than a vnic
 device, so there's something in the --nic argument, but you have to answer
 two questions:

 - how many devices do you need (which is now not a flavor property but in
 the --nic list, which seems to me an odd place to be defining billable
 resources)
 - what happens when someone does nova interface-attach?

 Agreed.

Apologies, I misread what you put, maybe we don't agree...

I am just trying not to make a passthrough NIC and odd special case.

In my mind, it should just be a regular neturon port connection that
happens to be implemented using PCI passthrough.

I agree we need to sort out the scheduling of that, because its a
finite resource.

 The --nic list specifies how many NICs.

 I was suggesting adding a nic-flavor on each --nic spec to say if its
 PCI passthrough vs virtual NIC.

 Cinder's an indirect parallel because the resources it's adding to the
 hypervisor are virtual and unlimited, I think, or am I missing something
 here?

 I was more referring more to the different volume-types i.e. fast
 volume or normal volume.
 And how that is similar to virtual vs fast PCI passthough vs slow
 PCI passthrough

 Local volumes probably have the same issues as PCI passthrough with
 finite resources.
 But I am not sure we have a good solution for that yet.

 Mostly, it seems right that Cinder and Neutron own the configuration
 about the volume and network resources.

 The VIF driver and volume drivers seem to have a similar sort of
 relationship with Cinder and Neutron vs Nova.

 Then the issues boils down to visibility into that data so we can
 schedule efficiently, which is no easy problem.


 However, it does suggest that Nova should leave all the SRIOV work to
 the VIF driver.
 So the VIF driver, as activate by neutron, will understand which PCI
 devices to passthrough.

 Similar to the plan with brick, we could have an oslo lib that helps
 you attach SRIOV devices that could be used by the neturon VIF drivers
 and the nova PCI passthrough code.

 I'm not clear that this is necessary.

 At the moment with vNICs, you pass through devices by having a co-operation
 between Neutron (which configures a way of attaching them to put them on a
 certain network) and the hypervisor specific code (which creates them in the
 instance and attaches them as instructed by Neutron).  Why would we not
 follow the same pattern with passthrough devices?  In this instance, neutron
 would tell nova that when it's plugging this device it should be a
 passthrough device, and pass any additional parameters like the VF encap,
 and Nova would do as instructed, then Neutron would reconfigure whatever
 parts of the network need to be reconfigured in concert with the
 hypervisor's settings to make the NIC a part of the specified network.

 I agree, in general terms.

 Firstly, do you agree the neutron network-id can be used for
 passthrough and non-passthrough VIF connections? i.e. a neturon
 network-id does not imply PCI-passthrough.

 Secondly, we need to agree on the information flow around defining the
 flavor of the NIC. i.e. virtual or passthroughFast or
 passthroughNormal.

 My gut feeling is that neutron port description should somehow define
 this via a nic-flavor that maps to a group of pci-flavors.

 But from a billing point of view, I like the idea of the server flavor
 saying to the VIF plug code, by the way, for this server, please
 support all the nics using devices in pciflavor:fastNic should that be
 possible for the users given port 

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread Irena Berezovsky
Hi John,
I totally agree that we should define the use cases both for administration and 
tenant that powers the VM.
Since we are trying to support PCI pass-through network, let's focus on the 
related use cases.
Please see my comments inline.

Regards,
Irena
-Original Message-
From: John Garbutt [mailto:j...@johngarbutt.com] 
Sent: Thursday, December 19, 2013 1:42 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Apologies for being late onto this thread, and not making the meeting the other 
day.
Also apologies this is almost totally a top post.

On 17 December 2013 15:09, Ian Wells ijw.ubu...@cack.org.uk wrote:
 Firstly, I disagree that
 https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support is an 
 accurate reflection of the current state.  It's a very unilateral 
 view, largely because the rest of us had been focussing on the google 
 document that we've been using for weeks.

I haven't seen the google doc. I got involved through the blueprint review of 
this:
https://blueprints.launchpad.net/nova/+spec/pci-extra-info

I assume its this one?
https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs

On a quick read, my main concern is separating out the user more:
* administration (defines pci-flavor, defines which hosts can provide it, 
defines server flavor...)
* person who boots server (picks server flavor, defines neutron ports)

Note, I don't see the person who boots the server ever seeing the pci-flavor, 
only understanding the server flavor.
[IrenaB] I am not sure that elaborating PCI device request into server flavor 
is the right approach for the PCI pass-through network case. vNIC by its nature 
is something dynamic that can be plugged or unplugged after VM boot. server 
flavor is  quite static.

We might also want a nic-flavor that tells neutron information it requires, 
but lets get to that later...
[IrenaB] nic flavor is definitely something that we need in order to choose if  
high performance (PCI pass-through) or virtio (i.e. OVS) nic will be created.

 Secondly, I totally disagree with this approach.  This assumes that 
 description of the (cloud-internal, hardware) details of each compute 
 node is best done with data stored centrally and driven by an API.  I 
 don't agree with either of these points.

Possibly, but I would like to first agree on the use cases and data model we 
want.

Nova has generally gone for APIs over config in recent times.
Mostly so you can do run-time configuration of the system.
But lets just see what makes sense when we have the use cases agreed.

 On 2013年12月16日 22:27, Robert Li (baoli) wrote:
 I'd like to give you guy a summary of current state, let's discuss it 
 then.
 https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support


 1)  fade out alias ( i think this ok for all)
 2)  white list became pic-flavor ( i think this ok for all)
 3)  address simply regular expression support: only * and a number 
 range is support [hex-hex]. ( i think this ok?)
 4)  aggregate : now it's clear enough, and won't impact SRIOV.  ( i 
 think this irrelevant to SRIOV now)

So... this means we have:

PCI-flavor:
* i.e. standardGPU, standardGPUnew, fastGPU, hdFlash1TB etc

Host mapping:
* decide which hosts you allow a particular flavor to be used
* note, the scheduler still needs to find out if any devices are free

flavor (of the server):
* usual RAM, CPU, Storage
* use extra specs to add PCI devices
* example:
** add one PCI device, choice of standardGPU or standardGPUnew
** also add: one hdFlash1TB

Now, the other bit is SRIOV... At a high level:

Neutron:
* user wants to connect to a particular neutron network
* user wants a super-fast SRIOV connection

Administration:
* needs to map PCI device to what neutron network the connect to

The big question is:
* is this a specific SRIOV only (provider) network
* OR... are other non-SRIOV connections also made to that same network

I feel we have to go for that latter. Imagine a network on VLAN 42, you might 
want some SRIOV into that network, and some OVS connecting into the same 
network. The user might have VMs connected using both methods, so wants the 
same IP address ranges and same network id spanning both.
[IrenaB] Agree. SRIOV connection is the choice for certain VM on certain 
network. The same VM can be connected to other network via virtio nic as well 
as other VMs can be connected to the same network via virtio nics.

If we go for that latter new either need:
* some kind of nic-flavor
** boot ... -nic nic-id:public-id:,nic-flavor:10GBpassthrough
** but neutron could store nic-flavor, and pass it through to VIF driver, and 
user says port-id
* OR add NIC config into the server flavor
** extra spec to say, tell VIF driver it could use on of this list of PCI 
devices: (list pci-flavors)
* OR do both

I vote for nic-flavor only, because it matches the volume-type we have

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread John Garbutt
Response inline...

On 19 December 2013 13:05, Irena Berezovsky ire...@mellanox.com wrote:
 Hi John,
 I totally agree that we should define the use cases both for administration 
 and tenant that powers the VM.
 Since we are trying to support PCI pass-through network, let's focus on the 
 related use cases.
 Please see my comments inline.

Cool.

 Regards,
 Irena
 -Original Message-
 From: John Garbutt [mailto:j...@johngarbutt.com]
 Sent: Thursday, December 19, 2013 1:42 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

 Apologies for being late onto this thread, and not making the meeting the 
 other day.
 Also apologies this is almost totally a top post.

 On 17 December 2013 15:09, Ian Wells ijw.ubu...@cack.org.uk wrote:
 Firstly, I disagree that
 https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support is an
 accurate reflection of the current state.  It's a very unilateral
 view, largely because the rest of us had been focussing on the google
 document that we've been using for weeks.

 I haven't seen the google doc. I got involved through the blueprint review of 
 this:
 https://blueprints.launchpad.net/nova/+spec/pci-extra-info

 I assume its this one?
 https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs

 On a quick read, my main concern is separating out the user more:
 * administration (defines pci-flavor, defines which hosts can provide it, 
 defines server flavor...)
 * person who boots server (picks server flavor, defines neutron ports)

 Note, I don't see the person who boots the server ever seeing the pci-flavor, 
 only understanding the server flavor.
 [IrenaB] I am not sure that elaborating PCI device request into server flavor 
 is the right approach for the PCI pass-through network case. vNIC by its 
 nature is something dynamic that can be plugged or unplugged after VM boot. 
 server flavor is  quite static.

I was really just meaning the server flavor specify the type of NIC to attach.

The existing port specs, etc, define how many nics, and you can hot
plug as normal, just the VIF plugger code is told by the server flavor
if it is able to PCI passthrough, and which devices it can pick from.
The idea being combined with the neturon network-id you know what to
plug.

The more I talk about this approach the more I hate it :(

 We might also want a nic-flavor that tells neutron information it requires, 
 but lets get to that later...
 [IrenaB] nic flavor is definitely something that we need in order to choose 
 if  high performance (PCI pass-through) or virtio (i.e. OVS) nic will be 
 created.

Well, I think its the right way go. Rather than overloading the server
flavor with hints about which PCI devices you could use.

 Secondly, I totally disagree with this approach.  This assumes that
 description of the (cloud-internal, hardware) details of each compute
 node is best done with data stored centrally and driven by an API.  I
 don't agree with either of these points.

 Possibly, but I would like to first agree on the use cases and data model we 
 want.

 Nova has generally gone for APIs over config in recent times.
 Mostly so you can do run-time configuration of the system.
 But lets just see what makes sense when we have the use cases agreed.

 On 2013年12月16日 22:27, Robert Li (baoli) wrote:
 I'd like to give you guy a summary of current state, let's discuss it
 then.
 https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support


 1)  fade out alias ( i think this ok for all)
 2)  white list became pic-flavor ( i think this ok for all)
 3)  address simply regular expression support: only * and a number
 range is support [hex-hex]. ( i think this ok?)
 4)  aggregate : now it's clear enough, and won't impact SRIOV.  ( i
 think this irrelevant to SRIOV now)

 So... this means we have:

 PCI-flavor:
 * i.e. standardGPU, standardGPUnew, fastGPU, hdFlash1TB etc

 Host mapping:
 * decide which hosts you allow a particular flavor to be used
 * note, the scheduler still needs to find out if any devices are free

 flavor (of the server):
 * usual RAM, CPU, Storage
 * use extra specs to add PCI devices
 * example:
 ** add one PCI device, choice of standardGPU or standardGPUnew
 ** also add: one hdFlash1TB

 Now, the other bit is SRIOV... At a high level:

 Neutron:
 * user wants to connect to a particular neutron network
 * user wants a super-fast SRIOV connection

 Administration:
 * needs to map PCI device to what neutron network the connect to

 The big question is:
 * is this a specific SRIOV only (provider) network
 * OR... are other non-SRIOV connections also made to that same network

 I feel we have to go for that latter. Imagine a network on VLAN 42, you might 
 want some SRIOV into that network, and some OVS connecting into the same 
 network. The user might have VMs connected using both methods, so wants the 
 same IP address ranges and same

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread Ian Wells
On 19 December 2013 15:15, John Garbutt j...@johngarbutt.com wrote:

  Note, I don't see the person who boots the server ever seeing the
 pci-flavor, only understanding the server flavor.
   [IrenaB] I am not sure that elaborating PCI device request into server
 flavor is the right approach for the PCI pass-through network case. vNIC by
 its nature is something dynamic that can be plugged or unplugged after VM
 boot. server flavor is  quite static.

 I was really just meaning the server flavor specify the type of NIC to
 attach.

 The existing port specs, etc, define how many nics, and you can hot
 plug as normal, just the VIF plugger code is told by the server flavor
 if it is able to PCI passthrough, and which devices it can pick from.
 The idea being combined with the neturon network-id you know what to
 plug.

 The more I talk about this approach the more I hate it :(


The thinking we had here is that nova would provide a VIF or a physical NIC
for each attachment.  Precisely what goes on here is a bit up for grabs,
but I would think:

Nova specifiies the type at port-update, making it obvious to Neutron it's
getting a virtual interface or a passthrough NIC (and the type of that NIC,
probably, and likely also the path so that Neutron can distinguish between
NICs if it needs to know the specific attachment port)
Neutron does its magic on the network if it has any to do, like faffing(*)
with switches
Neutron selects the VIF/NIC plugging type that Nova should use, and in the
case that the NIC is a VF and it wants to set an encap, returns that encap
back to Nova
Nova plugs it in and sets it up (in libvirt, this is generally in the XML;
XenAPI and others are up for grabs).

  We might also want a nic-flavor that tells neutron information it
 requires, but lets get to that later...
  [IrenaB] nic flavor is definitely something that we need in order to
 choose if  high performance (PCI pass-through) or virtio (i.e. OVS) nic
 will be created.

 Well, I think its the right way go. Rather than overloading the server
 flavor with hints about which PCI devices you could use.


The issue here is that additional attach.  Since for passthrough that isn't
NICs (like crypto cards) you would almost certainly specify it in the
flavor, if you did the same for NICs then you would have a preallocated
pool of NICs from which to draw.  The flavor is also all you need to know
for billing, and the flavor lets you schedule.  If you have it on the list
of NICs, you have to work out how many physical NICs you need before you
schedule (admittedly not hard, but not in keeping) and if you then did a
subsequent attach it could fail because you have no more NICs on the
machine you scheduled to - and at this point you're kind of stuck.

Also with the former, if you've run out of NICs, the already-extant resize
call would allow you to pick a flavor with more NICs and you can then
reschedule the subsequent VM to wherever resources are available to fulfil
the new request.

One question here is whether Neutron should become a provider of billed
resources (specifically passthrough NICs) in the same way as Cinder is of
volumes - something we'd not discussed to date; we've largely worked on the
assumption that NICs are like any other passthrough resource, just one
where, once it's allocated out, Neutron can work magic with it.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-11-02 Thread Isaku Yamahata
Port profile is generic way of Neutron to pass plugin-specific data
as dictionary. Cisco plugin uses it to pass VMEFX specific data.
Robert, correct me if I'm wrong.

thanks,
---
Isaku Yamahata isaku.yamah...@gmail.com


On Thu, Oct 31, 2013 at 10:21:20PM +,
Jiang, Yunhong yunhong.ji...@intel.com wrote:

 Robert, I think your change request for pci alias should be covered by the 
 extra infor enhancement. 
 https://blueprints.launchpad.net/nova/+spec/pci-extra-info  and Yongli is 
 working on it.
 
 I'm not sure how the port profile is passed to the connected switch, is it a 
 Cisco VMEFX specific method or libvirt method? Sorry I'm not well on network 
 side.
 
 --jyh
 
 From: Robert Li (baoli) [mailto:ba...@cisco.com]
 Sent: Wednesday, October 30, 2013 10:13 AM
 To: Irena Berezovsky; Jiang, Yunhong; prashant.upadhy...@aricent.com; 
 chris.frie...@windriver.com; He, Yongli; Itzik Brown
 Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
 (kmestery); Sandhya Dasu (sadasu)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
 
 Hi,
 
 Regarding physical network mapping,  This is what I thought.
 
 consider the following scenarios:
1. a compute node with SRIOV only interfaces attached to a physical 
 network. the node is connected to one upstream switch
2. a compute node with both SRIOV interfaces and non-SRIOV interfaces 
 attached to a physical network. the node is connected to one upstream switch
3. in addition to case 1 2, a compute node may have multiple vNICs that 
 are connected to different upstream switches.
 
 CASE 1:
  -- the mapping from a virtual network (in terms of neutron) to a physical 
 network is actually done by binding a port profile to a neutron port. With 
 cisco's VM-FEX, a port profile is associated with one or multiple vlans. Once 
 the neutron port is bound with this port-profile in the upstream switch, it's 
 effectively plugged into the physical network.
  -- since the compute node is connected to one upstream switch, the existing 
 nova PCI alias will be sufficient. For example, one can boot a Nova instance 
 that is attached to a SRIOV port with the following command:
   nova boot -flavor m1.large -image image-id --nic 
 net-id=net,pci-alias=alias,sriov=direct|macvtap,port-profile=profile
 the net-id will be useful in terms of allocating IP address, enable dhcp, 
 etc that is associated with the network.
 -- the pci-alias specified in the nova boot command is used to create a PCI 
 request for scheduling purpose. a PCI device is bound to a neutron port 
 during the instance build time in the case of nova boot. Before invoking the 
 neutron API to create a port, an allocated PCI device out of a PCI alias will 
 be located from the PCI device list object. This device info among other 
 information will be sent to neutron to create the port.
 
 CASE 2:
 -- Assume that OVS is used for the non-SRIOV interfaces. An example of 
 configuration with ovs plugin would look like:
 bridge_mappings = physnet1:br-vmfex
 network_vlan_ranges = physnet1:15:17
 tenant_network_type = vlan
 When a neutron network is created, a vlan is either allocated or 
 specified in the neutron net-create command. Attaching a physical interface 
 to the bridge (in the above example br-vmfex) is an administrative task.
 -- to create a Nova instance with non-SRIOV port:
nova boot -flavor m1.large -image image-id --nic net-id=net
 -- to create a Nova instance with SRIOV port:
nova boot -flavor m1.large -image image-id --nic 
 net-id=net,pci-alias=alias,sriov=direct|macvtap,port-profile=profile
 it's essentially the same as in the first case. But since the net-id is 
 already associated with a vlan, the vlan associated with the port-profile 
 must be identical to that vlan. This has to be enforced by neutron.
 again, since the node is connected to one upstream switch, the existing 
 nova PCI alias should be sufficient.
 
 CASE 3:
 -- A compute node might be connected to multiple upstream switches, with each 
 being a separate network. This means SRIOV PFs/VFs are already implicitly 
 associated with physical networks. In the none-SRIOV case, a physical 
 interface is associated with a physical network by plugging it into that 
 network, and attaching this interface to the ovs bridge that represents this 
 physical network on the compute node. In the SRIOV case, we need a way to 
 group the SRIOV VFs that belong to the same physical networks. The existing 
 nova PCI alias is to facilitate PCI device allocation by associating 
 product_id, vendor_id with an alias name. This will no longer be 
 sufficient. But it can be enhanced to achieve our goal. For example, the PCI 
 device domain, bus (if their mapping to vNIC is fixed across boot) may be 
 added into the alias, and the alias name should be corresponding to a list of 
 tuples.
 
 Another consideration

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-30 Thread Isaku Yamahata
On Wed, Oct 30, 2013 at 04:14:40AM +,
Jiang, Yunhong yunhong.ji...@intel.com wrote:

  But how about long term direction?
  Neutron should know/manage such network related resources on
  compute nodes?
 
 So you mean the PCI device management will be spited between Nova and 
 Neutron? For example, non-NIC device owned by nova and NIC device owned by 
 neutron?

Yes. But I'd like to hear from other Neutron developers.


 There have been so many discussion of the scheduler enhancement, like 
 https://etherpad.openstack.org/p/grizzly-split-out-scheduling , so possibly 
 that's the right direction? Let's wait for the summit discussion.

Interesting. Yeah, I look forward for the summit discussion.
Let's try to involve not only Nova developers, but also other Neutron
developers.

thanks,
-- 
Isaku Yamahata isaku.yamah...@gmail.com

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Robert Li (baoli)
.



Regards,
Irena

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Friday, October 25, 2013 11:16 PM
To: prashant.upadhy...@aricent.commailto:prashant.upadhy...@aricent.com; 
Irena Berezovsky; yunhong.ji...@intel.commailto:yunhong.ji...@intel.com; 
chris.frie...@windriver.commailto:chris.frie...@windriver.com; 
yongli...@intel.commailto:yongli...@intel.com
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

This is Robert Li from Cisco Systems. Recently, I was tasked to investigate 
such support for Cisco's systems that support VM-FEX, which is a SRIOV 
technology supporting 802-1Qbh. I was able to bring up nova instances with 
SRIOV interfaces, and establish networking in between the instances that 
employes the SRIOV interfaces. Certainly, this was accomplished with hacking 
and some manual intervention. Based on this experience and my study with the 
two existing nova pci-passthrough blueprints that have been implemented and 
committed into Havana 
(https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt),  I 
registered a couple of blueprints (one on Nova side, the other on the Neutron 
side):

https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov

in order to address SRIOV support in openstack.

Please take a look at them and see if they make sense, and let me know any 
comments and questions. We can also discuss this in the summit, I suppose.

I noticed that there is another thread on this topic, so copy those folks  from 
that thread as well.

thanks,
Robert

On 10/16/13 4:32 PM, Irena Berezovsky 
ire...@mellanox.commailto:ire...@mellanox.com wrote:

Hi,
As one of the next steps for PCI pass-through I would like to discuss is the 
support for PCI pass-through vNIC.
While nova takes care of PCI pass-through device resources  management and VIF 
settings, neutron should manage their networking configuration.
I would like to register asummit proposal to discuss the support for PCI 
pass-through networking.
I am not sure what would be the right topic to discuss the PCI pass-through 
networking, since it involve both nova and neutron.
There is already a session registered by Yongli on nova topic to discuss the 
PCI pass-through next steps.
I think PCI pass-through networking is quite a big topic and it worth to have a 
separate discussion.
Is there any other people who are interested to discuss it and share their 
thoughts and experience?

Regards,
Irena

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong
Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting because 
it's more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with -nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; prashant.upadhy...@aricent.com; Jiang, Yunhong; 
chris.frie...@windriver.com; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, Irena Berezovsky 
ire...@mellanox.commailto:ire...@mellanox.com wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order to provision the neutron 
services. The vif driver is responsible for plugging in an instance onto the 
networking setup that neutron has created on the host.

Normally, one will invoke nova boot api with the -nic options to specify the 
nic with which the instance will be connected to the network. It currently 
allows net-id, fixed ip and/or port-id to be specified for the option. However, 
it doesn't allow one to specify special networking requirements for the 
instance. Thanks to the nova pci-passthrough work, one can specify PCI 
passthrough device(s) in the nova flavor. But it doesn't provide means to tie 
up these PCI devices in the case of ethernet adpators with networking services. 
Therefore the idea is actually simple as indicated by the blueprint titles, to 
provide means to tie up SRIOV devices with neutron services. A work flow would 
roughly look like this for 'nova boot':

  -- Specifies networking requirements in the -nic option. Specifically for 
SRIOV, allow the following to be specified in addition to the existing required 
information:
   . PCI alias
   . direct pci-passthrough/macvtap
   . port profileid that is compliant with 802.1Qbh

The above information is optional. In the absence of them, the existing 
behavior remains.

 -- if special networking requirements exist, Nova api creates PCI requests 
in the nova instance type for scheduling purpose

 -- Nova scheduler schedules the instance based on the requested flavor 
plus the PCI requests that are created for networking.

 -- Nova compute invokes neutron services with PCI passthrough information 
if any

 --  Neutron performs its normal operations based on the request, such as 
allocating a port, assigning ip addresses, etc. Specific to SRIOV, it should 
validate the information such as profileid, and stores them in its db. It's 
also possible to associate a port profileid with a neutron network so that port 
profileid becomes optional in the -nic option. Neutron returns  nova the port 
information, especially for PCI passthrough related information in the port 
binding object. Currently, the port binding object contains the following 
information:
  binding:vif_type
  binding:host_id
  binding:profile
  binding:capabilities

-- nova constructs the domain xml and plug in the instance by calling the 
vif driver. The vif driver can build up the interface xml based on the port 
binding information.




The blueprints you registered make sense. On Nova side, there is a need to bind 
between requested virtual network and PCI device/interface to be allocated as 
vNIC.
On the Neutron side, there is a need to  support networking configuration of 
the vNIC. Neutron should be able to identify the PCI device/macvtap interface 
in order to apply configuration. I think it makes sense to provide neutron 
integration via dedicated Modular Layer 2 Mechanism Driver to allow PCI 
pass-through vNIC support along with other networking technologies.

I haven't sorted through this yet. A neutron port could be associated with a 
PCI device or not, which is a common feature, IMHO. However, a ML2 driver may 
be needed specific to a particular SRIOV technology.


During the Havana Release, we introduced Mellanox Neutron plugin that enables 
networking via SRIOV pass-through devices or macvtap interfaces.
We want to integrate

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Irena Berezovsky
Hi Jiang, Robert,
IRC meeting option works for me.
If I understand your question below, you are looking for a way to tie up 
between requested virtual network(s) and requested PCI device(s). The way we 
did it in our solution  is to map a provider:physical_network to an interface 
that represents the Physical Function. Every virtual network is bound to the 
provider:physical_network, so the PCI device should be allocated based on this 
mapping.  We can  map a PCI alias to the provider:physical_network.

Another topic to discuss is where the mapping between neutron port and PCI 
device should be managed. One way to solve it, is to propagate the allocated 
PCI device details to neutron on port creation.
In case  there is no qbg/qbh support, VF networking configuration should be 
applied locally on the Host.
The question is when and how to apply networking configuration on the PCI 
device?
We see the following options:

* it can be done on port creation.

* It can be done when nova VIF driver is called for vNIC plugging. This 
will require to  have all networking configuration available to the VIF driver 
or send request to the neutron server to obtain it.

* It can be done by  having a dedicated L2 neutron agent on each Host 
that scans for allocated PCI devices  and then retrieves networking 
configuration from the server and configures the device. The agent will be also 
responsible for managing update requests coming from the neutron server.


For macvtap vNIC type assignment, the networking configuration can be applied 
by a dedicated L2 neutron agent.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Tuesday, October 29, 2013 9:04 AM

To: Robert Li (baoli); Irena Berezovsky; prashant.upadhy...@aricent.com; 
chris.frie...@windriver.com; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting because 
it's more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with -nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; 
prashant.upadhy...@aricent.commailto:prashant.upadhy...@aricent.com; Jiang, 
Yunhong; chris.frie...@windriver.commailto:chris.frie...@windriver.com; He, 
Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, Irena Berezovsky 
ire...@mellanox.commailto:ire...@mellanox.com wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order to provision the neutron 
services. The vif driver is responsible for plugging in an instance onto the 
networking setup that neutron has created on the host.

Normally, one will invoke nova boot api with the -nic options to specify the 
nic with which the instance will be connected to the network. It currently 
allows net-id, fixed ip and/or port-id to be specified for the option. However, 
it doesn't allow one to specify special networking requirements for the 
instance. Thanks to the nova pci-passthrough work, one can specify PCI 
passthrough device(s) in the nova flavor. But it doesn't provide means to tie 
up these PCI devices in the case of ethernet adpators with networking services. 
Therefore the idea is actually simple as indicated by the blueprint titles, to 
provide means to tie up SRIOV devices with neutron services. A work flow would 
roughly look like this for 'nova boot':

  -- Specifies networking requirements in the -nic option. Specifically for 
SRIOV, allow the following to be specified in addition to the existing required 
information:
   . PCI alias
   . direct pci-passthrough/macvtap
   . port profileid that is compliant with 802.1Qbh

The above information is optional. In the absence of them, the existing 
behavior remains

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread John Garbutt
I would love to see a symmetry between Cinder local volumes and
Neutron PCI passthrough VIFs.

Not entirely sure I have that clear in my head right now, but I just
wanted to share the idea:
* describe resource external to nova that is attached to VM in the API
(block device mapping and/or vif references)
* ideally the nova scheduler needs to be aware of the local capacity,
and how that relates to the above information (relates to the cross
service scheduling issues)
* state of the device should be stored by Neutron/Cinder
(attached/detached, capacity, IP, etc), but still exposed to the
scheduler
* connection params get given to Nova from Neutron/Cinder
* nova still has the vif driver or volume driver to make the final connection
* the disk should be formatted/expanded, and network info injected in
the same way as before (cloud-init, config drive, DHCP, etc)

John

On 29 October 2013 10:17, Irena Berezovsky ire...@mellanox.com wrote:
 Hi Jiang, Robert,

 IRC meeting option works for me.

 If I understand your question below, you are looking for a way to tie up
 between requested virtual network(s) and requested PCI device(s). The way we
 did it in our solution  is to map a provider:physical_network to an
 interface that represents the Physical Function. Every virtual network is
 bound to the provider:physical_network, so the PCI device should be
 allocated based on this mapping.  We can  map a PCI alias to the
 provider:physical_network.



 Another topic to discuss is where the mapping between neutron port and PCI
 device should be managed. One way to solve it, is to propagate the allocated
 PCI device details to neutron on port creation.

 In case  there is no qbg/qbh support, VF networking configuration should be
 applied locally on the Host.

 The question is when and how to apply networking configuration on the PCI
 device?

 We see the following options:

 · it can be done on port creation.

 · It can be done when nova VIF driver is called for vNIC plugging.
 This will require to  have all networking configuration available to the VIF
 driver or send request to the neutron server to obtain it.

 · It can be done by  having a dedicated L2 neutron agent on each
 Host that scans for allocated PCI devices  and then retrieves networking
 configuration from the server and configures the device. The agent will be
 also responsible for managing update requests coming from the neutron
 server.



 For macvtap vNIC type assignment, the networking configuration can be
 applied by a dedicated L2 neutron agent.



 BR,

 Irena



 From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
 Sent: Tuesday, October 29, 2013 9:04 AM


 To: Robert Li (baoli); Irena Berezovsky; prashant.upadhy...@aricent.com;
 chris.frie...@windriver.com; He, Yongli; Itzik Brown


 Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery
 (kmestery); Sandhya Dasu (sadasu)
 Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network
 support



 Robert, is it possible to have a IRC meeting? I’d prefer to IRC meeting
 because it’s more openstack style and also can keep the minutes clearly.



 To your flow, can you give more detailed example. For example, I can
 consider user specify the instance with –nic option specify a network id,
 and then how nova device the requirement to the PCI device? I assume the
 network id should define the switches that the device can connect to , but
 how is that information translated to the PCI property requirement? Will
 this translation happen before the nova scheduler make host decision?



 Thanks

 --jyh



 From: Robert Li (baoli) [mailto:ba...@cisco.com]
 Sent: Monday, October 28, 2013 12:22 PM
 To: Irena Berezovsky; prashant.upadhy...@aricent.com; Jiang, Yunhong;
 chris.frie...@windriver.com; He, Yongli; Itzik Brown
 Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery
 (kmestery); Sandhya Dasu (sadasu)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support



 Hi Irena,



 Thank you very much for your comments. See inline.



 --Robert



 On 10/27/13 3:48 AM, Irena Berezovsky ire...@mellanox.com wrote:



 Hi Robert,

 Thank you very much for sharing the information regarding your efforts. Can
 you please share your idea of the end to end flow? How do you suggest  to
 bind Nova and Neutron?



 The end to end flow is actually encompassed in the blueprints in a nutshell.
 I will reiterate it in below. The binding between Nova and Neutron occurs
 with the neutron v2 API that nova invokes in order to provision the neutron
 services. The vif driver is responsible for plugging in an instance onto the
 networking setup that neutron has created on the host.



 Normally, one will invoke nova boot api with the —nic options to specify
 the nic with which the instance will be connected to the network. It
 currently allows net-id, fixed ip and/or port-id to be specified for the
 option. However, it doesn't

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Robert Li (baoli)
Hi,

sounds like there are enough interests for an IRC meeting before the summit. Do 
you guys know how to schedule a #openstack IRC meeting?

thanks,
Robert

On 10/29/13 6:17 AM, Irena Berezovsky 
ire...@mellanox.commailto:ire...@mellanox.com wrote:

Hi Jiang, Robert,
IRC meeting option works for me.
If I understand your question below, you are looking for a way to tie up 
between requested virtual network(s) and requested PCI device(s). The way we 
did it in our solution  is to map a provider:physical_network to an interface 
that represents the Physical Function. Every virtual network is bound to the 
provider:physical_network, so the PCI device should be allocated based on this 
mapping.  We can  map a PCI alias to the provider:physical_network.

Another topic to discuss is where the mapping between neutron port and PCI 
device should be managed. One way to solve it, is to propagate the allocated 
PCI device details to neutron on port creation.
In case  there is no qbg/qbh support, VF networking configuration should be 
applied locally on the Host.
The question is when and how to apply networking configuration on the PCI 
device?
We see the following options:

· it can be done on port creation.

· It can be done when nova VIF driver is called for vNIC plugging. This 
will require to  have all networking configuration available to the VIF driver 
or send request to the neutron server to obtain it.

· It can be done by  having a dedicated L2 neutron agent on each Host 
that scans for allocated PCI devices  and then retrieves networking 
configuration from the server and configures the device. The agent will be also 
responsible for managing update requests coming from the neutron server.


For macvtap vNIC type assignment, the networking configuration can be applied 
by a dedicated L2 neutron agent.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Tuesday, October 29, 2013 9:04 AM

To: Robert Li (baoli); Irena Berezovsky; 
prashant.upadhy...@aricent.commailto:prashant.upadhy...@aricent.com; 
chris.frie...@windriver.commailto:chris.frie...@windriver.com; He, Yongli; 
Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Robert, is it possible to have a IRC meeting? I’d prefer to IRC meeting because 
it’s more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with –nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; 
prashant.upadhy...@aricent.commailto:prashant.upadhy...@aricent.com; Jiang, 
Yunhong; chris.frie...@windriver.commailto:chris.frie...@windriver.com; He, 
Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, Irena Berezovsky 
ire...@mellanox.commailto:ire...@mellanox.com wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order to provision the neutron 
services. The vif driver is responsible for plugging in an instance onto the 
networking setup that neutron has created on the host.

Normally, one will invoke nova boot api with the —nic options to specify the 
nic with which the instance will be connected to the network. It currently 
allows net-id, fixed ip and/or port-id to be specified for the option. However, 
it doesn't allow one to specify special networking requirements for the 
instance. Thanks to the nova pci-passthrough work, one can specify PCI 
passthrough device(s) in the nova flavor. But it doesn't provide means to tie 
up these PCI devices in the case of ethernet adpators with networking services. 
Therefore the idea is actually simple as indicated by the blueprint titles, to 
provide means to tie up SRIOV devices with neutron services. A work flow would 
roughly look like this for 'nova boot':

  -- Specifies networking requirements in the —nic option. Specifically for 
SRIOV, allow the following

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Robert Li (baoli)
Hi Yunhong,

I haven't looked at Mellanox in much detail. I think that we'll get more 
details from Irena down the road. Regarding your question, I can only answer 
based on my experience with Cisco's VM-FEX. In a nutshell:
 -- a vNIC is connected to an external switch. Once the host is booted up, 
all the PFs and VFs provisioned on the vNIC will be created, as well as all the 
corresponding ethernet interfaces .
 -- As far as Neutron is concerned, a neutron port can be associated with a 
VF. One way to do so is to specify this requirement in the —nic option, 
providing information such as:
   . PCI alias (this is the same alias as defined in your nova 
blueprints)
   . direct pci-passthrough/macvtap
   . port profileid that is compliant with 802.1Qbh
 -- similar to how you translate the nova flavor with PCI requirements to 
PCI requests for scheduling purpose, Nova API (the nova api component) can 
translate the above to PCI requests for scheduling purpose. I can give more 
detail later on this.

Regarding your last question, since the vNIC is already connected with the 
external switch, the vNIC driver will be responsible for communicating the port 
profile to the external switch. As you have already known, libvirt provides 
several ways to specify a VM to be booted up with SRIOV. For example, in the 
following interface definition:


  interface type='hostdev' managed='yes'
  source
address type='pci' domain='0' bus='0x09' slot='0x0' function='0x01'/
  /source
  mac address='01:23:45:67:89:ab' /
  virtualport type='802.1Qbh'
parameters profileid='my-port-profile' /
  /virtualport
/interface


The SRIOV VF (bus 0x09, VF 0x01) will be allocated, and the port profile 
'my-port-profile' will be used to provision this VF. Libvirt will be 
responsible for invoking the vNIC driver to configure this VF with the port 
profile my-port-porfile. The driver will talk to the external switch using the 
802.1qbh standards to complete the VF's configuration and binding with the VM.


Now that nova PCI passthrough is responsible for 
discovering/scheduling/allocating a VF, the rest of the puzzle is to associate 
this PCI device with the feature that's going to use it, and the feature will 
be responsible for configuring it. You can also see from the above example, in 
one implementation of SRIOV, the feature (in this case neutron) may not need to 
do much in terms of working with the external switch, the work is actually done 
by libvirt behind the scene.


Now the questions are:

-- how the port profile gets defined/managed

-- how the port profile gets associated with a neutron network

The first question will be specific to the particular product, and therefore a 
particular neutron plugin has to mange that.

There may be several approaches to address the second question. For example, in 
the simplest case, a port profile can be associated with a neutron network. 
This has some significant drawbacks. Since the port profile defines features 
for all the ports that use it, the one port profile to one neutron network 
mapping would mean all the ports on the network will have exactly the same 
features (for example, QoS characteristics). To make it flexible, the binding 
of a port profile to a port may be done at the port creation time.


Let me know if the above answered your question.


thanks,

Robert

On 10/29/13 3:03 AM, Jiang, Yunhong 
yunhong.ji...@intel.commailto:yunhong.ji...@intel.com wrote:

Robert, is it possible to have a IRC meeting? I’d prefer to IRC meeting because 
it’s more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with –nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; 
prashant.upadhy...@aricent.commailto:prashant.upadhy...@aricent.com; Jiang, 
Yunhong; chris.frie...@windriver.commailto:chris.frie...@windriver.com; He, 
Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, Irena Berezovsky 
ire...@mellanox.commailto:ire...@mellanox.com wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Robert Li (baoli)
Hi John,

Great to hear from you on Cinder with pcipassthrough. I thought that it
would be coming. I like the idea.

thanks,
Robert

On 10/29/13 6:46 AM, John Garbutt j...@johngarbutt.com wrote:

I would love to see a symmetry between Cinder local volumes and
Neutron PCI passthrough VIFs.

Not entirely sure I have that clear in my head right now, but I just
wanted to share the idea:
* describe resource external to nova that is attached to VM in the API
(block device mapping and/or vif references)
* ideally the nova scheduler needs to be aware of the local capacity,
and how that relates to the above information (relates to the cross
service scheduling issues)
* state of the device should be stored by Neutron/Cinder
(attached/detached, capacity, IP, etc), but still exposed to the
scheduler
* connection params get given to Nova from Neutron/Cinder
* nova still has the vif driver or volume driver to make the final
connection
* the disk should be formatted/expanded, and network info injected in
the same way as before (cloud-init, config drive, DHCP, etc)

John

On 29 October 2013 10:17, Irena Berezovsky ire...@mellanox.com wrote:
 Hi Jiang, Robert,

 IRC meeting option works for me.

 If I understand your question below, you are looking for a way to tie up
 between requested virtual network(s) and requested PCI device(s). The
way we
 did it in our solution  is to map a provider:physical_network to an
 interface that represents the Physical Function. Every virtual network
is
 bound to the provider:physical_network, so the PCI device should be
 allocated based on this mapping.  We can  map a PCI alias to the
 provider:physical_network.



 Another topic to discuss is where the mapping between neutron port and
PCI
 device should be managed. One way to solve it, is to propagate the
allocated
 PCI device details to neutron on port creation.

 In case  there is no qbg/qbh support, VF networking configuration
should be
 applied locally on the Host.

 The question is when and how to apply networking configuration on the
PCI
 device?

 We see the following options:

 · it can be done on port creation.

 · It can be done when nova VIF driver is called for vNIC
plugging.
 This will require to  have all networking configuration available to
the VIF
 driver or send request to the neutron server to obtain it.

 · It can be done by  having a dedicated L2 neutron agent on each
 Host that scans for allocated PCI devices  and then retrieves networking
 configuration from the server and configures the device. The agent will
be
 also responsible for managing update requests coming from the neutron
 server.



 For macvtap vNIC type assignment, the networking configuration can be
 applied by a dedicated L2 neutron agent.



 BR,

 Irena



 From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
 Sent: Tuesday, October 29, 2013 9:04 AM


 To: Robert Li (baoli); Irena Berezovsky; prashant.upadhy...@aricent.com;
 chris.frie...@windriver.com; He, Yongli; Itzik Brown


 Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
Mestery
 (kmestery); Sandhya Dasu (sadasu)
 Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network
 support



 Robert, is it possible to have a IRC meeting? I¹d prefer to IRC meeting
 because it¹s more openstack style and also can keep the minutes clearly.



 To your flow, can you give more detailed example. For example, I can
 consider user specify the instance with ­nic option specify a network
id,
 and then how nova device the requirement to the PCI device? I assume the
 network id should define the switches that the device can connect to ,
but
 how is that information translated to the PCI property requirement? Will
 this translation happen before the nova scheduler make host decision?



 Thanks

 --jyh



 From: Robert Li (baoli) [mailto:ba...@cisco.com]
 Sent: Monday, October 28, 2013 12:22 PM
 To: Irena Berezovsky; prashant.upadhy...@aricent.com; Jiang, Yunhong;
 chris.frie...@windriver.com; He, Yongli; Itzik Brown
 Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
Mestery
 (kmestery); Sandhya Dasu (sadasu)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support



 Hi Irena,



 Thank you very much for your comments. See inline.



 --Robert



 On 10/27/13 3:48 AM, Irena Berezovsky ire...@mellanox.com wrote:



 Hi Robert,

 Thank you very much for sharing the information regarding your efforts.
Can
 you please share your idea of the end to end flow? How do you suggest
to
 bind Nova and Neutron?



 The end to end flow is actually encompassed in the blueprints in a
nutshell.
 I will reiterate it in below. The binding between Nova and Neutron
occurs
 with the neutron v2 API that nova invokes in order to provision the
neutron
 services. The vif driver is responsible for plugging in an instance
onto the
 networking setup that neutron has created on the host.



 Normally, one will invoke nova boot api

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Henry Gessau
Lots of great info and discussion going on here.

One additional thing I would like to mention is regarding PF and VF usage.

Normally VFs will be assigned to instances, and the PF will either not be
used at all, or maybe some agent in the host of the compute node might have
access to the PF for something (management?).

There is a neutron design track around the development of service VMs.
These are dedicated instances that run neutron services like routers,
firewalls, etc. It is plausible that a service VM would like to use PCI
passthrough and get the entire PF. This would allow it to have complete
control over a physical link, which I think will be wanted in some cases.

-- 
Henry

On Tue, Oct 29, at 10:23 am, Irena Berezovsky ire...@mellanox.com wrote:

 Hi,
 
 I would like to share some details regarding the support provided by
 Mellanox plugin. It enables networking via SRIOV pass-through devices or
 macvtap interfaces.  It plugin is available here:
 https://github.com/openstack/neutron/tree/master/neutron/plugins/mlnx.
 
 To support either PCI pass-through device and macvtap interface type of
 vNICs, we set neutron port profile:vnic_type according to the required VIF
 type and then use the created port to ‘nova boot’ the VM.
 
 To  overcome the missing scheduler awareness for PCI devices which was not
 part of the Havana release yet, we
 
 have an additional service (embedded switch Daemon) that runs on each
 compute node.  
 
 This service manages the SRIOV resources allocation,  answers vNICs
 discovery queries and applies VLAN/MAC configuration using standard Linux
 APIs (code is here: https://github.com/mellanox-openstack/mellanox-eswitchd
 ).  The embedded switch Daemon serves as a glue layer between VIF Driver and
 Neutron Agent.
 
 In the Icehouse Release when SRIOV resources allocation is already part of
 the Nova, we plan to eliminate the need in embedded switch daemon service.
 So what is left to figure out is how to tie up between neutron port and PCI
 device and invoke networking configuration.
 
  
 
 In our case what we have is actually the Hardware VEB that is not programmed
 via either 802.1Qbg or 802.1Qbh, but configured locally by Neutron Agent. We
 also support both Ethernet and InfiniBand physical network L2 technology.
 This means that we apply different configuration commands  to set
 configuration on VF.
 
  
 
 I guess what we have to figure out is how to support the generic case for
 the PCI device networking support, for HW VEB, 802.1Qbg and 802.1Qbh cases.
 
  
 
 BR,
 
 Irena
 
  
 
 *From:*Robert Li (baoli) [mailto:ba...@cisco.com]
 *Sent:* Tuesday, October 29, 2013 3:31 PM
 *To:* Jiang, Yunhong; Irena Berezovsky; prashant.upadhy...@aricent.com;
 chris.frie...@windriver.com; He, Yongli; Itzik Brown
 *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
 Mestery (kmestery); Sandhya Dasu (sadasu)
 *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network 
 support
 
  
 
 Hi Yunhong,
 
  
 
 I haven't looked at Mellanox in much detail. I think that we'll get more
 details from Irena down the road. Regarding your question, I can only answer
 based on my experience with Cisco's VM-FEX. In a nutshell:
 
  -- a vNIC is connected to an external switch. Once the host is booted
 up, all the PFs and VFs provisioned on the vNIC will be created, as well as
 all the corresponding ethernet interfaces . 
 
  -- As far as Neutron is concerned, a neutron port can be associated
 with a VF. One way to do so is to specify this requirement in the —nic
 option, providing information such as:
 
. PCI alias (this is the same alias as defined in your nova
 blueprints)
 
. direct pci-passthrough/macvtap
 
. port profileid that is compliant with 802.1Qbh
 
  -- similar to how you translate the nova flavor with PCI requirements
 to PCI requests for scheduling purpose, Nova API (the nova api component)
 can translate the above to PCI requests for scheduling purpose. I can give
 more detail later on this. 
 
  
 
 Regarding your last question, since the vNIC is already connected with the
 external switch, the vNIC driver will be responsible for communicating the
 port profile to the external switch. As you have already known, libvirt
 provides several ways to specify a VM to be booted up with SRIOV. For
 example, in the following interface definition: 
 
   
 
   *interface type='hostdev' managed='yes'*
 
 *  source*
 
 *address type='pci' domain='0' bus='0x09' slot='0x0' 
 function='0x01'/*
 
 *  /source*
 
 *  mac address='01:23:45:67:89:ab' /*
 
 *  virtualport type='802.1Qbh'*
 
 *parameters profileid='my-port-profile' /*
 
 *  /virtualport*
 
 */interface*
 
  
 
 The SRIOV VF (bus 0x09, VF 0x01) will be allocated, and the port profile 
 'my-port-profile' will be used to provision this VF. Libvirt will be 
 responsible for invoking the vNIC driver to configure

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong

 * describe resource external to nova that is attached to VM in the API
 (block device mapping and/or vif references)
 * ideally the nova scheduler needs to be aware of the local capacity,
 and how that relates to the above information (relates to the cross
 service scheduling issues)

I think this possibly a bit different. For volume, it's sure managed by Cinder, 
but for PCI devices, currently
It ;s managed by nova. So we possibly need nova to translate the information 
(possibly before nova scheduler).

 * state of the device should be stored by Neutron/Cinder
 (attached/detached, capacity, IP, etc), but still exposed to the
 scheduler

I'm not sure if we can keep the state of the device in Neutron. Currently nova 
manage all PCI devices.

Thanks
--jyh


 * connection params get given to Nova from Neutron/Cinder
 * nova still has the vif driver or volume driver to make the final connection
 * the disk should be formatted/expanded, and network info injected in
 the same way as before (cloud-init, config drive, DHCP, etc)
 
 John
 
 On 29 October 2013 10:17, Irena Berezovsky ire...@mellanox.com
 wrote:
  Hi Jiang, Robert,
 
  IRC meeting option works for me.
 
  If I understand your question below, you are looking for a way to tie up
  between requested virtual network(s) and requested PCI device(s). The
 way we
  did it in our solution  is to map a provider:physical_network to an
  interface that represents the Physical Function. Every virtual network is
  bound to the provider:physical_network, so the PCI device should be
  allocated based on this mapping.  We can  map a PCI alias to the
  provider:physical_network.
 
 
 
  Another topic to discuss is where the mapping between neutron port
 and PCI
  device should be managed. One way to solve it, is to propagate the
 allocated
  PCI device details to neutron on port creation.
 
  In case  there is no qbg/qbh support, VF networking configuration
 should be
  applied locally on the Host.
 
  The question is when and how to apply networking configuration on the
 PCI
  device?
 
  We see the following options:
 
  * it can be done on port creation.
 
  * It can be done when nova VIF driver is called for vNIC
 plugging.
  This will require to  have all networking configuration available to the
 VIF
  driver or send request to the neutron server to obtain it.
 
  * It can be done by  having a dedicated L2 neutron agent on
 each
  Host that scans for allocated PCI devices  and then retrieves networking
  configuration from the server and configures the device. The agent will
 be
  also responsible for managing update requests coming from the neutron
  server.
 
 
 
  For macvtap vNIC type assignment, the networking configuration can be
  applied by a dedicated L2 neutron agent.
 
 
 
  BR,
 
  Irena
 
 
 
  From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
  Sent: Tuesday, October 29, 2013 9:04 AM
 
 
  To: Robert Li (baoli); Irena Berezovsky;
 prashant.upadhy...@aricent.com;
  chris.frie...@windriver.com; He, Yongli; Itzik Brown
 
 
  Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
 Mestery
  (kmestery); Sandhya Dasu (sadasu)
  Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network
  support
 
 
 
  Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting
  because it's more openstack style and also can keep the minutes
 clearly.
 
 
 
  To your flow, can you give more detailed example. For example, I can
  consider user specify the instance with -nic option specify a network id,
  and then how nova device the requirement to the PCI device? I assume
 the
  network id should define the switches that the device can connect to ,
 but
  how is that information translated to the PCI property requirement? Will
  this translation happen before the nova scheduler make host decision?
 
 
 
  Thanks
 
  --jyh
 
 
 
  From: Robert Li (baoli) [mailto:ba...@cisco.com]
  Sent: Monday, October 28, 2013 12:22 PM
  To: Irena Berezovsky; prashant.upadhy...@aricent.com; Jiang, Yunhong;
  chris.frie...@windriver.com; He, Yongli; Itzik Brown
  Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
 Mestery
  (kmestery); Sandhya Dasu (sadasu)
  Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
  support
 
 
 
  Hi Irena,
 
 
 
  Thank you very much for your comments. See inline.
 
 
 
  --Robert
 
 
 
  On 10/27/13 3:48 AM, Irena Berezovsky ire...@mellanox.com
 wrote:
 
 
 
  Hi Robert,
 
  Thank you very much for sharing the information regarding your efforts.
 Can
  you please share your idea of the end to end flow? How do you suggest
 to
  bind Nova and Neutron?
 
 
 
  The end to end flow is actually encompassed in the blueprints in a
 nutshell.
  I will reiterate it in below. The binding between Nova and Neutron
 occurs
  with the neutron v2 API that nova invokes in order to provision the
 neutron
  services. The vif driver is responsible for plugging

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong
Your explanation of the virtual network and physical network is quite clear and 
should work well. We need change nova code to achieve it, including get the 
physical network for the virtual network, passing the physical network 
requirement to the filter properties etc.

For your port method, so you mean we are sure to passing network id to 'nova 
boot' and nova will create the port during VM boot, am I right?  Also, how can 
nova knows that it need allocate the PCI device for the port? I'd suppose that 
in SR-IOV NIC environment, user don't need specify the PCI requirement. 
Instead, the PCI requirement should come from the network configuration and 
image property. Or you think user still need passing flavor with pci request?

--jyh


From: Irena Berezovsky [mailto:ire...@mellanox.com]
Sent: Tuesday, October 29, 2013 3:17 AM
To: Jiang, Yunhong; Robert Li (baoli); prashant.upadhy...@aricent.com; 
chris.frie...@windriver.com; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Jiang, Robert,
IRC meeting option works for me.
If I understand your question below, you are looking for a way to tie up 
between requested virtual network(s) and requested PCI device(s). The way we 
did it in our solution  is to map a provider:physical_network to an interface 
that represents the Physical Function. Every virtual network is bound to the 
provider:physical_network, so the PCI device should be allocated based on this 
mapping.  We can  map a PCI alias to the provider:physical_network.

Another topic to discuss is where the mapping between neutron port and PCI 
device should be managed. One way to solve it, is to propagate the allocated 
PCI device details to neutron on port creation.
In case  there is no qbg/qbh support, VF networking configuration should be 
applied locally on the Host.
The question is when and how to apply networking configuration on the PCI 
device?
We see the following options:

* it can be done on port creation.

* It can be done when nova VIF driver is called for vNIC plugging. This 
will require to  have all networking configuration available to the VIF driver 
or send request to the neutron server to obtain it.

* It can be done by  having a dedicated L2 neutron agent on each Host 
that scans for allocated PCI devices  and then retrieves networking 
configuration from the server and configures the device. The agent will be also 
responsible for managing update requests coming from the neutron server.


For macvtap vNIC type assignment, the networking configuration can be applied 
by a dedicated L2 neutron agent.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Tuesday, October 29, 2013 9:04 AM

To: Robert Li (baoli); Irena Berezovsky; 
prashant.upadhy...@aricent.commailto:prashant.upadhy...@aricent.com; 
chris.frie...@windriver.commailto:chris.frie...@windriver.com; He, Yongli; 
Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting because 
it's more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with -nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; 
prashant.upadhy...@aricent.commailto:prashant.upadhy...@aricent.com; Jiang, 
Yunhong; chris.frie...@windriver.commailto:chris.frie...@windriver.com; He, 
Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, Irena Berezovsky 
ire...@mellanox.commailto:ire...@mellanox.com wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order to provision the neutron 
services. The vif driver is responsible for plugging in an instance onto

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong
Henry,why do you think the service VM need the entire PF instead of a VF? I 
think the SR-IOV NIC should provide QoS and performance isolation.

As to assign entire PCI device to a guest, that should be ok since usually PF 
and VF has different device ID, the tricky thing is, at least for some PCI 
devices, you can't configure that some NIC will have SR-IOV enabled while 
others not.

Thanks
--jyh

 -Original Message-
 From: Henry Gessau [mailto:ges...@cisco.com]
 Sent: Tuesday, October 29, 2013 8:10 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support
 
 Lots of great info and discussion going on here.
 
 One additional thing I would like to mention is regarding PF and VF usage.
 
 Normally VFs will be assigned to instances, and the PF will either not be
 used at all, or maybe some agent in the host of the compute node might
 have
 access to the PF for something (management?).
 
 There is a neutron design track around the development of service VMs.
 These are dedicated instances that run neutron services like routers,
 firewalls, etc. It is plausible that a service VM would like to use PCI
 passthrough and get the entire PF. This would allow it to have complete
 control over a physical link, which I think will be wanted in some cases.
 
 --
 Henry
 
 On Tue, Oct 29, at 10:23 am, Irena Berezovsky ire...@mellanox.com
 wrote:
 
  Hi,
 
  I would like to share some details regarding the support provided by
  Mellanox plugin. It enables networking via SRIOV pass-through devices
 or
  macvtap interfaces.  It plugin is available here:
 
 https://github.com/openstack/neutron/tree/master/neutron/plugins/mln
 x.
 
  To support either PCI pass-through device and macvtap interface type of
  vNICs, we set neutron port profile:vnic_type according to the required
 VIF
  type and then use the created port to 'nova boot' the VM.
 
  To  overcome the missing scheduler awareness for PCI devices which
 was not
  part of the Havana release yet, we
 
  have an additional service (embedded switch Daemon) that runs on each
  compute node.
 
  This service manages the SRIOV resources allocation,  answers vNICs
  discovery queries and applies VLAN/MAC configuration using standard
 Linux
  APIs (code is here:
 https://github.com/mellanox-openstack/mellanox-eswitchd
  ).  The embedded switch Daemon serves as a glue layer between VIF
 Driver and
  Neutron Agent.
 
  In the Icehouse Release when SRIOV resources allocation is already part
 of
  the Nova, we plan to eliminate the need in embedded switch daemon
 service.
  So what is left to figure out is how to tie up between neutron port and
 PCI
  device and invoke networking configuration.
 
 
 
  In our case what we have is actually the Hardware VEB that is not
 programmed
  via either 802.1Qbg or 802.1Qbh, but configured locally by Neutron
 Agent. We
  also support both Ethernet and InfiniBand physical network L2
 technology.
  This means that we apply different configuration commands  to set
  configuration on VF.
 
 
 
  I guess what we have to figure out is how to support the generic case for
  the PCI device networking support, for HW VEB, 802.1Qbg and
 802.1Qbh cases.
 
 
 
  BR,
 
  Irena
 
 
 
  *From:*Robert Li (baoli) [mailto:ba...@cisco.com]
  *Sent:* Tuesday, October 29, 2013 3:31 PM
  *To:* Jiang, Yunhong; Irena Berezovsky;
 prashant.upadhy...@aricent.com;
  chris.frie...@windriver.com; He, Yongli; Itzik Brown
  *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen);
 Kyle
  Mestery (kmestery); Sandhya Dasu (sadasu)
  *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through
 network support
 
 
 
  Hi Yunhong,
 
 
 
  I haven't looked at Mellanox in much detail. I think that we'll get more
  details from Irena down the road. Regarding your question, I can only
 answer
  based on my experience with Cisco's VM-FEX. In a nutshell:
 
   -- a vNIC is connected to an external switch. Once the host is
 booted
  up, all the PFs and VFs provisioned on the vNIC will be created, as well as
  all the corresponding ethernet interfaces .
 
   -- As far as Neutron is concerned, a neutron port can be
 associated
  with a VF. One way to do so is to specify this requirement in the -nic
  option, providing information such as:
 
 . PCI alias (this is the same alias as defined in your nova
  blueprints)
 
 . direct pci-passthrough/macvtap
 
 . port profileid that is compliant with 802.1Qbh
 
   -- similar to how you translate the nova flavor with PCI
 requirements
  to PCI requests for scheduling purpose, Nova API (the nova api
 component)
  can translate the above to PCI requests for scheduling purpose. I can
 give
  more detail later on this.
 
 
 
  Regarding your last question, since the vNIC is already connected with
 the
  external switch, the vNIC driver will be responsible for communicating

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Henry Gessau
On Tue, Oct 29, at 4:31 pm, Jiang, Yunhong yunhong.ji...@intel.com wrote:

 Henry,why do you think the service VM need the entire PF instead of a
 VF? I think the SR-IOV NIC should provide QoS and performance isolation.

I was speculating. I just thought it might be a good idea to leave open the
possibility of assigning a PF to a VM if the need arises.

Neutron service VMs are a new thing. I will be following the discussions and
there is a summit session for them. It remains to be seen if there is any
desire/need for full PF ownership of NICs. But if a service VM owns the PF
and has the right NIC driver it could do some advanced features with it.

 As to assign entire PCI device to a guest, that should be ok since
 usually PF and VF has different device ID, the tricky thing is, at least
 for some PCI devices, you can't configure that some NIC will have SR-IOV
 enabled while others not.

Thanks for the warning. :) Perhaps the cloud admin might plug in an extra
NIC in just a few nodes (one or two per rack, maybe) for the purpose of
running service VMs there. Again, just speculating. I don't know how hard it
is to manage non-homogenous nodes.

 
 Thanks
 --jyh
 
 -Original Message-
 From: Henry Gessau [mailto:ges...@cisco.com]
 Sent: Tuesday, October 29, 2013 8:10 AM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support
 
 Lots of great info and discussion going on here.
 
 One additional thing I would like to mention is regarding PF and VF usage.
 
 Normally VFs will be assigned to instances, and the PF will either not be
 used at all, or maybe some agent in the host of the compute node might
 have
 access to the PF for something (management?).
 
 There is a neutron design track around the development of service VMs.
 These are dedicated instances that run neutron services like routers,
 firewalls, etc. It is plausible that a service VM would like to use PCI
 passthrough and get the entire PF. This would allow it to have complete
 control over a physical link, which I think will be wanted in some cases.
 
 --
 Henry
 
 On Tue, Oct 29, at 10:23 am, Irena Berezovsky ire...@mellanox.com
 wrote:
 
  Hi,
 
  I would like to share some details regarding the support provided by
  Mellanox plugin. It enables networking via SRIOV pass-through devices
 or
  macvtap interfaces.  It plugin is available here:
 
 https://github.com/openstack/neutron/tree/master/neutron/plugins/mln
 x.
 
  To support either PCI pass-through device and macvtap interface type of
  vNICs, we set neutron port profile:vnic_type according to the required
 VIF
  type and then use the created port to 'nova boot' the VM.
 
  To  overcome the missing scheduler awareness for PCI devices which
 was not
  part of the Havana release yet, we
 
  have an additional service (embedded switch Daemon) that runs on each
  compute node.
 
  This service manages the SRIOV resources allocation,  answers vNICs
  discovery queries and applies VLAN/MAC configuration using standard
 Linux
  APIs (code is here:
 https://github.com/mellanox-openstack/mellanox-eswitchd
  ).  The embedded switch Daemon serves as a glue layer between VIF
 Driver and
  Neutron Agent.
 
  In the Icehouse Release when SRIOV resources allocation is already part
 of
  the Nova, we plan to eliminate the need in embedded switch daemon
 service.
  So what is left to figure out is how to tie up between neutron port and
 PCI
  device and invoke networking configuration.
 
 
 
  In our case what we have is actually the Hardware VEB that is not
 programmed
  via either 802.1Qbg or 802.1Qbh, but configured locally by Neutron
 Agent. We
  also support both Ethernet and InfiniBand physical network L2
 technology.
  This means that we apply different configuration commands  to set
  configuration on VF.
 
 
 
  I guess what we have to figure out is how to support the generic case for
  the PCI device networking support, for HW VEB, 802.1Qbg and
 802.1Qbh cases.
 
 
 
  BR,
 
  Irena
 
 
 
  *From:*Robert Li (baoli) [mailto:ba...@cisco.com]
  *Sent:* Tuesday, October 29, 2013 3:31 PM
  *To:* Jiang, Yunhong; Irena Berezovsky;
 prashant.upadhy...@aricent.com;
  chris.frie...@windriver.com; He, Yongli; Itzik Brown
  *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen);
 Kyle
  Mestery (kmestery); Sandhya Dasu (sadasu)
  *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through
 network support
 
 
 
  Hi Yunhong,
 
 
 
  I haven't looked at Mellanox in much detail. I think that we'll get more
  details from Irena down the road. Regarding your question, I can only
 answer
  based on my experience with Cisco's VM-FEX. In a nutshell:
 
   -- a vNIC is connected to an external switch. Once the host is
 booted
  up, all the PFs and VFs provisioned on the vNIC will be created, as well as
  all the corresponding ethernet interfaces .
 
   -- As far as Neutron is concerned, a neutron port

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong


 -Original Message-
 From: Henry Gessau [mailto:ges...@cisco.com]
 Sent: Tuesday, October 29, 2013 2:23 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support
 
 On Tue, Oct 29, at 4:31 pm, Jiang, Yunhong yunhong.ji...@intel.com
 wrote:
 
  Henry,why do you think the service VM need the entire PF instead of a
  VF? I think the SR-IOV NIC should provide QoS and performance
 isolation.
 
 I was speculating. I just thought it might be a good idea to leave open the
 possibility of assigning a PF to a VM if the need arises.
 
 Neutron service VMs are a new thing. I will be following the discussions
 and
 there is a summit session for them. It remains to be seen if there is any
 desire/need for full PF ownership of NICs. But if a service VM owns the PF
 and has the right NIC driver it could do some advanced features with it.
 
At least in current PCI implementation, if a device has no SR-IOV enabled, then 
that device will be exposed and can be assigned (is this your so-called PF?). 
If a device has SR-IOV enabled, then only VF be exposed and the PF is hidden 
from resource tracker. The reason is, when SR-IOV enabled, the PF is mostly 
used to configure and management the VFs, and it will be security issue to 
expose the PF to a guest.

I'm not sure if you are talking about the PF, are you talking about the PF w/ 
or w/o SR-IOV enabled. 

I totally agree that assign a PCI NIC to service VM have a lot of benefit from 
both performance and isolation point of view.

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Henry Gessau
On Tue, Oct 29, at 5:52 pm, Jiang, Yunhong yunhong.ji...@intel.com wrote:

 -Original Message-
 From: Henry Gessau [mailto:ges...@cisco.com]
 Sent: Tuesday, October 29, 2013 2:23 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support
 
 On Tue, Oct 29, at 4:31 pm, Jiang, Yunhong yunhong.ji...@intel.com
 wrote:
 
  Henry,why do you think the service VM need the entire PF instead of a
  VF? I think the SR-IOV NIC should provide QoS and performance
 isolation.
 
 I was speculating. I just thought it might be a good idea to leave open the
 possibility of assigning a PF to a VM if the need arises.
 
 Neutron service VMs are a new thing. I will be following the discussions
 and
 there is a summit session for them. It remains to be seen if there is any
 desire/need for full PF ownership of NICs. But if a service VM owns the PF
 and has the right NIC driver it could do some advanced features with it.
 
 At least in current PCI implementation, if a device has no SR-IOV
 enabled, then that device will be exposed and can be assigned (is this
 your so-called PF?).

Apologies, this was not clear to me until now. Thanks. I am not aware of a
use-case for a service VM needing to control VFs. So you are right, I should
not have talked about PF but rather just the entire NIC device in
passthrough mode, no SR-IOV needed.

So the admin will need to know: Put a NIC in SR-IOV mode if it is to be used
by multiple VMs. Put a NIC in single device passthrough mode if it is to be
used by one service VM.

 If a device has SR-IOV enabled, then only VF be
 exposed and the PF is hidden from resource tracker. The reason is, when
 SR-IOV enabled, the PF is mostly used to configure and management the
 VFs, and it will be security issue to expose the PF to a guest.

Thanks for bringing up the security issue. If a physical network interface
is connected in a special way to some switch/router with the intention being
for it to be used only by a service VM, then close attention must be paid to
security. The device owner might get some low-level network access that can
be misused.

 I'm not sure if you are talking about the PF, are you talking about the
 PF w/ or w/o SR-IOV enabled.
 
 I totally agree that assign a PCI NIC to service VM have a lot of benefit
 from both performance and isolation point of view.
 
 Thanks
 --jyh
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong


 -Original Message-
 From: Isaku Yamahata [mailto:isaku.yamah...@gmail.com]
 Sent: Tuesday, October 29, 2013 8:24 PM
 To: OpenStack Development Mailing List (not for usage questions)
 Cc: isaku.yamah...@gmail.com; Itzik Brown
 Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
 support
 
 Hi Yunhong.
 
 On Tue, Oct 29, 2013 at 08:22:40PM +,
 Jiang, Yunhong yunhong.ji...@intel.com wrote:
 
   * describe resource external to nova that is attached to VM in the API
   (block device mapping and/or vif references)
   * ideally the nova scheduler needs to be aware of the local capacity,
   and how that relates to the above information (relates to the cross
   service scheduling issues)
 
  I think this possibly a bit different. For volume, it's sure managed by
 Cinder, but for PCI devices, currently
  It ;s managed by nova. So we possibly need nova to translate the
 information (possibly before nova scheduler).
 
   * state of the device should be stored by Neutron/Cinder
   (attached/detached, capacity, IP, etc), but still exposed to the
   scheduler
 
  I'm not sure if we can keep the state of the device in Neutron. Currently
 nova manage all PCI devices.
 
 Yes, with the current implementation, nova manages PCI devices and it
 works.
 That's great. It will remain so in Icehouse cycle (maybe also J?).
 
 But how about long term direction?
 Neutron should know/manage such network related resources on
 compute nodes?

So you mean the PCI device management will be spited between Nova and Neutron? 
For example, non-NIC device owned by nova and NIC device owned by neutron?

There have been so many discussion of the scheduler enhancement, like 
https://etherpad.openstack.org/p/grizzly-split-out-scheduling , so possibly 
that's the right direction? Let's wait for the summit discussion.

 The implementation in Nova will be moved into Neutron like what Cinder
 did?
 any opinions/thoughts?
 It seems that not so many Neutron developers are interested in PCI
 passthrough at the moment, though.
 
 There are use cases for this, I think.
 For example, some compute nodes use OVS plugin, another nodes LB
 plugin.
 (Right now it may not possible easily, but it will be with ML2 plugin and
 mechanism driver). User wants their VMs to run on nodes with OVS plugin
 for
 some reason(e.g. performance difference).
 Such usage would be handled similarly.
 
 Thanks,
 ---
 Isaku Yamahata
 
 
 
  Thanks
  --jyh
 
 
   * connection params get given to Nova from Neutron/Cinder
   * nova still has the vif driver or volume driver to make the final
 connection
   * the disk should be formatted/expanded, and network info injected in
   the same way as before (cloud-init, config drive, DHCP, etc)
  
   John
  
   On 29 October 2013 10:17, Irena Berezovsky
 ire...@mellanox.com
   wrote:
Hi Jiang, Robert,
   
IRC meeting option works for me.
   
If I understand your question below, you are looking for a way to tie
 up
between requested virtual network(s) and requested PCI device(s).
 The
   way we
did it in our solution  is to map a provider:physical_network to an
interface that represents the Physical Function. Every virtual
 network is
bound to the provider:physical_network, so the PCI device should
 be
allocated based on this mapping.  We can  map a PCI alias to the
provider:physical_network.
   
   
   
Another topic to discuss is where the mapping between neutron
 port
   and PCI
device should be managed. One way to solve it, is to propagate the
   allocated
PCI device details to neutron on port creation.
   
In case  there is no qbg/qbh support, VF networking configuration
   should be
applied locally on the Host.
   
The question is when and how to apply networking configuration on
 the
   PCI
device?
   
We see the following options:
   
* it can be done on port creation.
   
* It can be done when nova VIF driver is called for vNIC
   plugging.
This will require to  have all networking configuration available to
 the
   VIF
driver or send request to the neutron server to obtain it.
   
* It can be done by  having a dedicated L2 neutron agent
 on
   each
Host that scans for allocated PCI devices  and then retrieves
 networking
configuration from the server and configures the device. The agent
 will
   be
also responsible for managing update requests coming from the
 neutron
server.
   
   
   
For macvtap vNIC type assignment, the networking configuration can
 be
applied by a dedicated L2 neutron agent.
   
   
   
BR,
   
Irena
   
   
   
From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Tuesday, October 29, 2013 9:04 AM
   
   
To: Robert Li (baoli); Irena Berezovsky;
   prashant.upadhy...@aricent.com;
chris.frie...@windriver.com; He, Yongli; Itzik Brown
   
   
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen);
 Kyle

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-28 Thread yongli he
 is planned to be discussed
during the summit: http://summit.openstack.org/cfp/details/129. I
think it’s worth to drill down into more detailed proposal and
present it during the summit, especially since it impacts both
nova and neutron projects.

I agree. Maybe we can steal some time in that discussion.

Would you be interested in collaboration on this effort? Would you
be interested to exchange more emails or set an IRC/WebEx meeting
during this week before the summit?


Sure. If folks want to discuss it before the summit, we can schedule a 
webex later this week. Or otherwise, we can continue the discussion 
with email.


Regards,

Irena

*From:*Robert Li (baoli) [mailto:ba...@cisco.com]
*Sent:* Friday, October 25, 2013 11:16 PM
*To:* prashant.upadhy...@aricent.com
mailto:prashant.upadhy...@aricent.com; Irena Berezovsky;
yunhong.ji...@intel.com mailto:yunhong.ji...@intel.com;
chris.frie...@windriver.com mailto:chris.frie...@windriver.com;
yongli...@intel.com mailto:yongli...@intel.com
*Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen);
Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
*Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through
network support

Hi Irena,

This is Robert Li from Cisco Systems. Recently, I was tasked to
investigate such support for Cisco's systems that support VM-FEX,
which is a SRIOV technology supporting 802-1Qbh. I was able to
bring up nova instances with SRIOV interfaces, and establish
networking in between the instances that employes the SRIOV
interfaces. Certainly, this was accomplished with hacking and some
manual intervention. Based on this experience and my study with
the two existing nova pci-passthrough blueprints that have been
implemented and committed into Havana
(https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt),
 I registered a couple of blueprints (one on Nova side, the other
on the Neutron side):

https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov

https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov

in order to address SRIOV support in openstack.

Please take a look at them and see if they make sense, and let me
know any comments and questions. We can also discuss this in the
summit, I suppose.

I noticed that there is another thread on this topic, so copy
those folks  from that thread as well.

thanks,

Robert

On 10/16/13 4:32 PM, Irena Berezovsky ire...@mellanox.com
mailto:ire...@mellanox.com wrote:

Hi,

As one of the next steps for PCI pass-through I would like to
discuss is the support for PCI pass-through vNIC.

While nova takes care of PCI pass-through device resources
 management and VIF settings, neutron should manage their
networking configuration.

I would like to register asummit proposal to discuss the
support for PCI pass-through networking.

I am not sure what would be the right topic to discuss the PCI
pass-through networking, since it involve both nova and neutron.

There is already a session registered by Yongli on nova topic
to discuss the PCI pass-through next steps.

I think PCI pass-through networking is quite a big topic and
it worth to have a separate discussion.

Is there any other people who are interested to discuss it and
share their thoughts and experience?

Regards,

Irena



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-25 Thread Robert Li (baoli)
Hi Irena,

This is Robert Li from Cisco Systems. Recently, I was tasked to investigate 
such support for Cisco's systems that support VM-FEX, which is a SRIOV 
technology supporting 802-1Qbh. I was able to bring up nova instances with 
SRIOV interfaces, and establish networking in between the instances that 
employes the SRIOV interfaces. Certainly, this was accomplished with hacking 
and some manual intervention. Based on this experience and my study with the 
two existing nova pci-passthrough blueprints that have been implemented and 
committed into Havana 
(https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt),  I 
registered a couple of blueprints (one on Nova side, the other on the Neutron 
side):

https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov

in order to address SRIOV support in openstack.

Please take a look at them and see if they make sense, and let me know any 
comments and questions. We can also discuss this in the summit, I suppose.

I noticed that there is another thread on this topic, so copy those folks  from 
that thread as well.

thanks,
Robert

On 10/16/13 4:32 PM, Irena Berezovsky 
ire...@mellanox.commailto:ire...@mellanox.com wrote:

Hi,
As one of the next steps for PCI pass-through I would like to discuss is the 
support for PCI pass-through vNIC.
While nova takes care of PCI pass-through device resources  management and VIF 
settings, neutron should manage their networking configuration.
I would like to register a summit proposal to discuss the support for PCI 
pass-through networking.
I am not sure what would be the right topic to discuss the PCI pass-through 
networking, since it involve both nova and neutron.
There is already a session registered by Yongli on nova topic to discuss the 
PCI pass-through next steps.
I think PCI pass-through networking is quite a big topic and it worth to have a 
separate discussion.
Is there any other people who are interested to discuss it and share their 
thoughts and experience?

Regards,
Irena

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-16 Thread Irena Berezovsky
Hi,
As one of the next steps for PCI pass-through I would like to discuss is the 
support for PCI pass-through vNIC.
While nova takes care of PCI pass-through device resources  management and VIF 
settings, neutron should manage their networking configuration.
I would like to register a summit proposal to discuss the support for PCI 
pass-through networking.
I am not sure what would be the right topic to discuss the PCI pass-through 
networking, since it involve both nova and neutron.
There is already a session registered by Yongli on nova topic to discuss the 
PCI pass-through next steps.
I think PCI pass-through networking is quite a big topic and it worth to have a 
separate discussion.
Is there any other people who are interested to discuss it and share their 
thoughts and experience?

Regards,
Irena

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev