> On Fri, May 22, 2015 at 8:40 AM, Ciara Loftus <[email protected]> wrote: > > This patch adds support for a new port type to the userspace > > datapath called dpdkvhostuser. > > > > A new dpdkvhostuser port will create a unix domain socket which > > when provided to QEMU is used to facilitate communication between > > the virtio-net device on the VM and the OVS port on the host. > > > > vhost-cuse ('dpdkvhost') ports are still available, and will be > > enabled if vhost-cuse support is detected in the DPDK build > > specified during compilation of the switch. Otherwise, vhost-user > > ports are enabled. > > > > v4: > > - Included helper function for the new_device callbacks to minimise > > code duplication. > > - Fixed indentation & line-wrap. > > - Simplified and corrected the processing of vhost ovs-vswitchd flags. > > > > v5: > > - Removed unnecessary strdup() > > - Fixed spacing > > > > v6: > > - Rebased to master > > > > v7: > > - Rebased to master > > > > Signed-off-by: Ciara Loftus <[email protected]> > > --- > > Thanks for all changes. > The change log should not be part of commit message but should be > written here after "---". > > > INSTALL.DPDK.md | 174 > ++++++++++++++++++++++++++++++++++++++---------- > > acinclude.m4 | 3 + > > lib/netdev-dpdk.c | 153 ++++++++++++++++++++++++++++++++++--- > ----- > > lib/netdev.c | 3 +- > > vswitchd/ovs-vswitchd.c | 5 ++ > > 5 files changed, 277 insertions(+), 61 deletions(-) > > > ... > > > Following the steps above to create a bridge, you can now add DPDK vhost > > -as a port to the vswitch. > > +as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost ports can > have > > +arbitrary names. > > + > > +When adding vhost ports to the switch, take care depending on which > type of > > +vhost you are using. > > > > -`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 > type=dpdkvhost` > > + - For vhost-user (default), the name of the port type is `dpdkvhostuser` > > + > > + ``` > > + ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 > > + type=dpdkvhostuser > > + ``` > > > > -Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names: > > + This action creates a socket located at > > + `/usr/local/var/run/openvswitch/vhost-user-1`, which you must > provide > > + to your VM on the QEMU command line. More instructions on this can > be > > + found in the next section "DPDK vhost-user VM configuration" > > + Note: If you wish for the vhost-user sockets to be created in a > > + directory other than `/usr/local/var/run/openvswitch`, you may specify > > + another location on the ovs-vswitchd command line like so: > > > > -`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC > type=dpdkvhost` > > + `./vswitchd/ovs-vswitchd --dpdk --vhost_sock_dir /my-dir -c 0x1 ...` > > > Since we are going to deprecate cuse in future release, we should > switch the type naming. vhost user should be dpdkvhost and cuse can be > dpdkvhostcuse. > > > -However, please note that when attaching userspace devices to QEMU, > the > > -name provided during the add-port operation must match the ifname > parameter > > -on the QEMU command line. > > + - For vhost-cuse, the name of the port type is `dpdkvhost` > > > > + ``` > > + ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1 > > + type=dpdkvhost > > + ``` > > + > > + When attaching vhost-cuse ports to QEMU, the name provided during > the > > + add-port operation must match the ifname parameter on the QEMU > command > > + line. More instructions on this can be found in the section "DPDK > > + vhost-cuse VM configuration" > > + > > +DPDK vhost-user VM configuration: > > +--------------------------------- > > +Follow the steps below to attach vhost-user port(s) to a VM. > > > > -DPDK vhost VM configuration: > > ----------------------------- > > +1. Configure sockets. > > + Pass the following parameters to QEMU to attach a vhost-user device: > > > > - vhost ports use a Linux* character device to communicate with QEMU. > > + ``` > > + -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost- > user-1 > > + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce > > + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 > > + ``` > > + > > + ...where vhost-user-1 is the name of the vhost-user port added > > + to the switch. > > + Repeat the above parameters for multiple devices, changing the > > + chardev path and id as necessary. Note that a separate and different > > + chardev path needs to be specified for each vhost-user device. For > > + example you have a second vhost-user port named 'vhost-user-2', you > > + append your QEMU command line with an additional set of parameters: > > + > > + > > + ``` > > + -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost- > user-2 > > + -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce > > + -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 > > + ``` > > + > > +2. Configure huge pages. > > + QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access > a > > + virtio-net device's virtual rings and packet buffers mapping the VM's > > + physical memory on hugetlbfs. To enable vhost-ports to map the VM's > > + memory into their process address space, pass the following paramters > > + to QEMU: > > + > > + ``` > > + -object memory-backend-file,id=mem,size=4096M,mem- > path=/dev/hugepages, > > + share=on > > + -numa node,memdev=mem -mem-prealloc > > + ``` > > + > > +DPDK vhost-cuse VM configuration: > > +--------------------------------- > > + > > + vhost-cuse ports use a Linux* character device to communicate with > QEMU. > > By default it is set to `/dev/vhost-net`. It is possible to reuse this > > standard device for DPDK vhost, which makes setup a little simpler but > > it > > is better practice to specify an alternative character device in order > > to > > @@ -418,16 +521,19 @@ DPDK vhost VM configuration: > > QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a > > virtio-net device's virtual rings and packet buffers mapping the VM's > > physical memory on hugetlbfs. To enable vhost-ports to map the VM's > > - memory into their process address space, pass the following paramters > > + memory into their process address space, pass the following parameters > > to QEMU: > > > > `-object memory-backend-file,id=mem,size=4096M,mem- > path=/dev/hugepages, > > share=on -numa node,memdev=mem -mem-prealloc` > > > > + Note: For use with an earlier QEMU version such as v1.6.2, use the > > + following to configure hugepages instead: > > > > -DPDK vhost VM configuration with QEMU wrapper: > > ----------------------------------------------- > > + `-mem-path /dev/hugepages -mem-prealloc` > > > > +DPDK vhost-cuse VM configuration with QEMU wrapper: > > +--------------------------------------------------- > > The QEMU wrapper script automatically detects and calls QEMU with the > > necessary parameters. It performs the following actions: > > > I think we need to better organize documentation for DPDK vHost > support. User will be mostly interested in vhost user, so there should > be two sections under vHost section. first for vhost user > configuration and next cuse. So that it is easier to follow one setup > in continuous fashion. > > > @@ -453,8 +559,8 @@ qemu-wrap.py -cpu host -boot c -hda <disk image> > -m 4096 -smp 4 > > netdev=net1,mac=00:00:00:00:00:01 > > ``` > > > ... > > /* > > * Maximum amount of time in micro seconds to try and enqueue to vhost. > > @@ -126,7 +127,8 @@ enum { DRAIN_TSC = 200000ULL }; > > > > enum dpdk_dev_type { > > DPDK_DEV_ETH = 0, > > - DPDK_DEV_VHOST = 1 > > + DPDK_DEV_VHOST = 1, > > + DPDK_DEV_VHOST_USER = 2 > > }; > > > > static int rte_eal_init_ret = ENODEV; > > @@ -204,6 +206,9 @@ struct netdev_dpdk { > > /* virtio-net structure for vhost device */ > > OVSRCU_TYPE(struct virtio_net *) virtio_dev; > > > > + /* socket location for vhost-user device */ > > + char socket_path[IF_NAME_SZ]; > > + > Since this full path name it is better to use PATH_MAX, rather than > dpdk defined IF_NAME_SZ. > > > /* In dpdk_list. */ > > struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); > > rte_spinlock_t txq_lock; > > @@ -530,6 +535,24 @@ netdev_dpdk_init(struct netdev *netdev_, > unsigned int port_no, > > netdev_->n_txq = NR_QUEUE; > > netdev_->n_rxq = NR_QUEUE; > > > > + /* Take the name of the vhost-user port and append it to the location > where > > + * the socket is to be created, then register the socket. > > + */ > > + if (type == DPDK_DEV_VHOST_USER) { > > + snprintf(netdev->socket_path, sizeof(netdev->socket_path), > "%s/%s", > > + vhost_sock_dir, netdev_->name); > > + err = rte_vhost_driver_register(netdev->socket_path); > > + if (err) { > > + VLOG_ERR("vhost-user socket device setup failure for socket > %s\n", > > + netdev->socket_path); > > + goto unlock; > > + } > > + > > + VLOG_INFO("Socket %s created for vhost-user port %s\n", netdev- > >socket_path, netdev_->name); > > + } else { > > + strncpy(netdev->socket_path, "", sizeof(netdev->socket_path)); > > + } > > + > This can moved to netdev_dpdk_vhost_user_construct() to avoid checking > for DPDK_DEV_VHOST_USER. This way we can get rid of > DPDK_DEV_VHOST_USER enum value. > > > if (type == DPDK_DEV_ETH) { > > netdev_dpdk_alloc_txq(netdev, NR_QUEUE); > > err = dpdk_eth_dev_init(netdev); > > @@ -564,7 +587,7 @@ dpdk_dev_parse_name(const char dev_name[], > const char prefix[], > > } > > > > static int > > -netdev_dpdk_vhost_construct(struct netdev *netdev_) > > +vhost_construct_helper(struct netdev *netdev_, int type) > > { > > int err; > > > ... > > > + > > +static int > > +netdev_dpdk_vhost_user_construct(struct netdev *netdev_) > > +{ > > + return vhost_construct_helper(netdev_, DPDK_DEV_VHOST_USER); > > +} > > + > > +static int > > netdev_dpdk_construct(struct netdev *netdev) > > { > > unsigned int port_no; > > @@ -1025,7 +1060,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, > struct dp_packet **pkts, > > ovs_mutex_unlock(&dev->mutex); > > } > > > > - if (dev->type == DPDK_DEV_VHOST) { > > + if (dev->type == DPDK_DEV_VHOST || dev->type == > DPDK_DEV_VHOST_USER) { > > It is slightly simple if DPDK_DEV_VHOST_USER is removed. > > > __netdev_dpdk_vhost_send(netdev, (struct dp_packet **) mbufs, > newcnt, true); > > } else { > > dpdk_queue_pkts(dev, qid, mbufs, newcnt); > > @@ -1544,15 +1579,17 @@ set_irq_status(struct virtio_net *dev) > > * A new virtio-net device is added to a vhost port. > > */ > > static int > > -new_device(struct virtio_net *dev) > > +new_device_helper(struct virtio_net *dev, bool sock, int size) > > { > > struct netdev_dpdk *netdev; > > bool exists = false; > > + char* netdev_name; > > > > ovs_mutex_lock(&dpdk_mutex); > > /* Add device to the vhost port with the same name as that passed > down. */ > > LIST_FOR_EACH(netdev, list_node, &dpdk_list) { > > - if (strncmp(dev->ifname, netdev->up.name, IFNAMSIZ) == 0) { > > + netdev_name = sock ? netdev->socket_path : netdev->up.name; > > + if (strncmp(dev->ifname, netdev_name, size) == 0) { > We can store dev->ifname for vhost-cuse devices. So that new_device > can be a common function for both type of devices. > > > ovs_mutex_lock(&netdev->mutex); > > ovsrcu_set(&netdev->virtio_dev, dev); > > ovs_mutex_unlock(&netdev->mutex); > > @@ -1577,6 +1614,18 @@ new_device(struct virtio_net *dev) > > return 0; > > } > > > > +static int > > +new_device(struct virtio_net *dev) > > +{ > > + return new_device_helper(dev, false, IFNAMSIZ); > > +} > > + > > +static int > > +new_device_vhost_user(struct virtio_net *dev) > > +{ > > + return new_device_helper(dev, true, IF_NAME_SZ); > > +} > > + > > /* > > * Remove a virtio-net device from the specific vhost port. Use dev- > >remove > > * flag to stop any more packets from being sent or received to/from a VM > and > > @@ -1631,8 +1680,14 @@ static const struct virtio_net_device_ops > virtio_net_device_ops = > > .destroy_device = destroy_device, > > }; > > > > +const struct virtio_net_device_ops virtio_net_device_ops_vhost_user = > > +{ > > + .new_device = new_device_vhost_user, > > + .destroy_device = destroy_device, > > +}; > > + > > static void * > > -start_cuse_session_loop(void *dummy OVS_UNUSED) > > +start_vhost_loop(void *dummy OVS_UNUSED) > > { > > pthread_detach(pthread_self()); > > /* Put the cuse thread into quiescent state. */ > > @@ -1659,7 +1714,16 @@ dpdk_vhost_class_init(void) > > return -1; > > } > > > > - ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL); > > + ovs_thread_create("vhost_thread", start_vhost_loop, NULL); > > + return 0; > > +} > > + > > +static int > > +dpdk_vhost_user_class_init(void) > > +{ > > + > rte_vhost_driver_callback_register(&virtio_net_device_ops_vhost_user); > > + > > + ovs_thread_create("vhost_thread", start_vhost_loop, NULL); > > return 0; > > } > > > > @@ -1871,6 +1935,32 @@ unlock_dpdk: > > NULL, /* rxq_drain */ \ > > } > > > > +static int > > +process_vhost_flags(char* flag, char* default_val, int size, char** argv, > char** new_val) > > +{ > > + int changed = 0; > > + > > + /* Depending on which version of vhost is in use, process the vhost- > specific > > + * flag if it is provided on the vswitchd command line, otherwise > > resort > to > > + * a default value. > > + * > > + * For vhost-user: Process "--cuse_dev_name" to set the custom > location of > > + * the vhost-user socket(s). > > + * For vhost-cuse: Process "--vhost_sock_dir" to set the custom name > of the > > + * vhost-cuse character device. > > + */ > > + if (!strcmp(argv[1], flag) && (strlen(argv[2]) <= size)) { > > + *new_val = strdup(argv[2]); > > + VLOG_INFO("User-provided %s in use: %s", flag, *new_val); > > + changed = 1; > > + } else { > > + *new_val = default_val; > > + VLOG_INFO("No %s provided - defaulting to %s", flag, default_val); > > + } > > + > > + return changed; > > +} > > + > > int > > dpdk_init(int argc, char **argv) > > { > > @@ -1885,27 +1975,20 @@ dpdk_init(int argc, char **argv) > > argc--; > > argv++; > > > > - /* If the cuse_dev_name parameter has been provided, set > 'cuse_dev_name' to > > - * this string if it meets the correct criteria. Otherwise, set it to > > the > > - * default (vhost-net). > > - */ > > - if (!strcmp(argv[1], "--cuse_dev_name") && > > - (strlen(argv[2]) <= NAME_MAX)) { > > - > > - cuse_dev_name = strdup(argv[2]); > > - > > - /* Remove the cuse_dev_name configuration parameters from the > argument > > +#ifdef VHOST_CUSE > > + if (process_vhost_flags("--cuse_dev_name", strdup("vhost-net"), > > + PATH_MAX, argv, &cuse_dev_name)) { > > +#else > > + if (process_vhost_flags("--vhost_sock_dir", strdup(ovs_rundir()), > > + NAME_MAX, argv, &vhost_sock_dir)) { > > We need to check if the directory exist so that we can reject invalid > directory at vswitchd startup. > > > +#endif > > + /* Remove the vhost flag configuration parameters from the > argument > > * list, so that the correct elements are passed to the DPDK > > * initialization function > > */ > > argc -= 2; > > - argv += 2; /* Increment by two to bypass the cuse_dev_name > arguments */ > > + argv += 2; /* Increment by two to bypass the vhost flag > > arguments > */ > > base = 2; > > - > > - VLOG_ERR("User-provided cuse_dev_name in use: /dev/%s", > cuse_dev_name); > > - } else { > > - cuse_dev_name = "vhost-net"; > > - VLOG_INFO("No cuse_dev_name provided - defaulting to > /dev/vhost-net"); > > } > > > > /* Keep the program name argument as this is needed for call to > > @@ -1974,6 +2057,20 @@ static const struct netdev_class > dpdk_vhost_class = > > NULL, > > netdev_dpdk_vhost_rxq_recv); > > > > +const struct netdev_class dpdk_vhost_user_class = > > + NETDEV_DPDK_CLASS( > > + "dpdkvhostuser", > > + dpdk_vhost_user_class_init, > > + netdev_dpdk_vhost_user_construct, > > + netdev_dpdk_vhost_destruct, > > + netdev_dpdk_vhost_set_multiq, > > + netdev_dpdk_vhost_send, > > + netdev_dpdk_vhost_get_carrier, > > + netdev_dpdk_vhost_get_stats, > > + NULL, > > + NULL, > > + netdev_dpdk_vhost_rxq_recv); > > + > Since cuse suport is going to be around for some time, All cuse > related functions should have cuse in its name like > netdev_dpdk_vhost_user* function. Common functions can be just as it > just netdev_dpdk_vhost_* > > > void > > netdev_dpdk_register(void) > > { > > @@ -1987,7 +2084,11 @@ netdev_dpdk_register(void) > > dpdk_common_init(); > > netdev_register_provider(&dpdk_class); > > netdev_register_provider(&dpdk_ring_class); > > +#ifdef VHOST_CUSE > > netdev_register_provider(&dpdk_vhost_class); > > +#else > > + netdev_register_provider(&dpdk_vhost_user_class); > > +#endif > > ovsthread_once_done(&once); > > } > > } > > diff --git a/lib/netdev.c b/lib/netdev.c > > index 45f7f29..24351b1 100644 > > --- a/lib/netdev.c > > +++ b/lib/netdev.c > > @@ -111,7 +111,8 @@ netdev_is_pmd(const struct netdev *netdev) > > { > > return (!strcmp(netdev->netdev_class->type, "dpdk") || > > !strcmp(netdev->netdev_class->type, "dpdkr") || > > - !strcmp(netdev->netdev_class->type, "dpdkvhost")); > > + !strcmp(netdev->netdev_class->type, "dpdkvhost") || > > + !strcmp(netdev->netdev_class->type, "dpdkvhostuser")); > > } > > > > static void > > diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c > > index a1b33da..48651df 100644 > > --- a/vswitchd/ovs-vswitchd.c > > +++ b/vswitchd/ovs-vswitchd.c > > @@ -253,8 +253,13 @@ usage(void) > > vlog_usage(); > > printf("\nDPDK options:\n" > > " --dpdk options Initialize DPDK datapath.\n" > > +#ifdef VHOST_CUSE > > " --cuse_dev_name BASENAME override default character device > name\n" > > " for use with userspace vHost.\n"); > > +#else > > + " --vhost_sock_dir DIR override default directory where\n" > > + " vhost-user sockets are > > created.\n"); > > since --cuse_dev_name and --vhost_sock_dir is sub-argument under > --dpdk, it should have single hyphen prefix.
Thanks for the review, please see suggested changes implemented in the v8 patch. I've chosen to keep the 'dpdkvhostuser' port name and instead change the 'dpdkvhost' port to 'dpdkvhostcuse'. There tends to be confusion sometimes around which port is in use, but with these names it will be clearer to the user exactly which vhost they are using. Thanks, Ciara _______________________________________________ dev mailing list [email protected] http://openvswitch.org/mailman/listinfo/dev
