Hi Qi, I wanted to push this old series, but I still have some questions and comments. Please let's fix them quickly.
Note: I did not review the IPC mechanism and rollback. I trust you :) 28/09/2018 06:23, Qi Zhang: > --- a/doc/guides/rel_notes/release_18_11.rst > +++ b/doc/guides/rel_notes/release_18_11.rst > @@ -67,6 +67,12 @@ New Features > SR-IOV option in Hyper-V and Azure. This is an alternative to the previous > vdev_netvsc, tap, and failsafe drivers combination. > > +* **Support device multi-process hotplug.** > + > + Hotplug and hot-unplug for devices will now be supported in multiprocessing > + scenario. Any ethdev devices created in the primary process will be > regarded > + as shared and will be available for all DPDK processes. Synchronization > + between processes will be done using DPDK IPC. > A blank line is missing here. > API Changes > ----------- > @@ -91,6 +97,11 @@ API Changes > flag the MAC can be properly configured in any case. This is particularly > important for bonding. > > +* eal: scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended. > + > + In primary-secondary process model, ``rte_eal_hotplug_add`` will guarantee > + that device be attached on all processes, while ``rte_eal_hotplug_remove`` > + will guarantee device be detached on all processes. > Here too, double blank line before next heading. > --- a/lib/librte_eal/common/eal_common_dev.c > +++ b/lib/librte_eal/common/eal_common_dev.c > -int > -rte_eal_hotplug_add(const char *busname, const char *devname, > - const char *drvargs) > +/* help funciton to build devargs, caller should free the memory */ "help funciton" -> "helper function" > +static char * > +build_devargs(const char *busname, const char *devname, > + const char *drvargs) > { > char *devargs = NULL; > int size, length = -1; > @@ -140,19 +143,33 @@ rte_eal_hotplug_add(const char *busname, const char > *devname, > if (length >= size) > devargs = malloc(length + 1); > if (devargs == NULL) > - return -ENOMEM; > + break; > } while (size == 0); It is an old code, please rebase on master. > -int __rte_experimental > -rte_dev_remove(struct rte_device *dev) > +/* remove device at local process. */ > +int > +local_dev_remove(struct rte_device *dev) > { > struct rte_bus *bus; > int ret; > @@ -248,7 +268,194 @@ rte_dev_remove(struct rte_device *dev) > if (ret) > RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n", > dev->name); > - rte_devargs_remove(dev->devargs); > + else > + rte_devargs_remove(dev->devargs); It looks you are fixing a bug here. Good catch! > +int __rte_experimental > +rte_dev_probe(const char *devargs) > +{ > + struct eal_dev_mp_req req; > + struct rte_device *dev; > + int ret; > + > + memset(&req, 0, sizeof(req)); > + req.t = EAL_DEV_REQ_TYPE_ATTACH; > + strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN); > + > + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { > + /** > + * If in secondary process, just send IPC request to > + * primary process. > + */ > + ret = eal_dev_hotplug_request_to_primary(&req); > + if (ret) { > + RTE_LOG(ERR, EAL, > + "Failed to send hotplug request to primary\n"); > + return -ENOMSG; > + } > + if (req.result) > + RTE_LOG(ERR, EAL, > + "Failed to hotplug add device\n"); > + return req.result; > + } > + > + /* attach a shared device from primary start from here: */ > + > + /* primary attach the new device itself. */ > + ret = local_dev_probe(devargs, &dev); > + > + if (ret) { > + RTE_LOG(ERR, EAL, > + "Failed to attach device on primary process\n"); > + > + /** > + * it is possible that secondary process failed to attached a > + * device that primary process have during initialization, > + * so for -EEXIST case, we still need to sync with secondary > + * process. > + */ > + if (ret != -EEXIST) > + return ret; > + } > + > + /* primary send attach sync request to secondary. */ > + ret = eal_dev_hotplug_request_to_secondary(&req); > + > + /* if any commnunication error, we need to rollback. */ typo: communication > + if (ret) { > + RTE_LOG(ERR, EAL, > + "Failed to send hotplug add request to secondary\n"); > + ret = -ENOMSG; > + goto rollback; > + } > + > + /** > + * if any secondary failed to attach, we need to consider if rollback > + * is necessary. > + */ > + if (req.result) { > + RTE_LOG(ERR, EAL, > + "Failed to attach device on secondary process\n"); > + ret = req.result; > + > + /* for -EEXIST, we don't need to rollback. */ > + if (ret == -EEXIST) > + return ret; > + goto rollback; > + } > + > + return 0; > + > +rollback: > + req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK; > + > + /* primary send rollback request to secondary. */ > + if (eal_dev_hotplug_request_to_secondary(&req)) For all occurences of "if (function())", the coding style is requesting an explicit check of the return value: if (function() != 0) > + RTE_LOG(WARNING, EAL, > + "Failed to rollback device attach on secondary." > + "Devices in secondary may not sync with primary\n"); > + > + /* primary rollback itself. */ > + if (local_dev_remove(dev)) > + RTE_LOG(WARNING, EAL, > + "Failed to rollback device attach on primary." > + "Devices in secondary may not sync with primary\n"); > + > + return ret; > +} > + > +int __rte_experimental > +rte_dev_remove(struct rte_device *dev) > +{ > + struct eal_dev_mp_req req; > + char *devargs; > + int ret; > + > + devargs = build_devargs(dev->devargs->bus->name, dev->name, ""); > + if (devargs == NULL) > + return -ENOMEM; > + > + memset(&req, 0, sizeof(req)); > + req.t = EAL_DEV_REQ_TYPE_DETACH; > + strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN); > + free(devargs); > + > + if (rte_eal_process_type() != RTE_PROC_PRIMARY) { > + /** > + * If in secondary process, just send IPC request to > + * primary process. > + */ > + ret = eal_dev_hotplug_request_to_primary(&req); > + if (ret) { > + RTE_LOG(ERR, EAL, > + "Failed to send hotplug request to primary\n"); > + return -ENOMSG; > + } > + if (req.result) > + RTE_LOG(ERR, EAL, > + "Failed to hotplug remove device\n"); > + return req.result; > + } > + > + /* detach a device from primary start from here: */ > + > + /* primary send detach sync request to secondary */ > + ret = eal_dev_hotplug_request_to_secondary(&req); > + > + /** > + * if communication error, we need to rollback, because it is possible > + * part of the secondary processes still detached it successfully. > + */ > + if (ret) { ret is not a boolean, please do explicit check != 0. > + RTE_LOG(ERR, EAL, > + "Failed to send device detach request to secondary\n"); > + ret = -ENOMSG; > + goto rollback; > + } > + > + /** > + * if any secondary failed to detach, we need to consider if rollback > + * is necessary. > + */ > + if (req.result) { result is not a boolean, please do explicit check != 0. > + RTE_LOG(ERR, EAL, > + "Failed to detach device on secondary process\n"); > + ret = req.result; > + /** > + * if -ENOENT, we don't need to rollback, since devices is > + * already detached on secondary process. > + */ > + if (ret != -ENOENT) > + goto rollback; > + } > + > + /* primary detach the device itself. */ > + ret = local_dev_remove(dev); > + > + /* if primary failed, still need to consider if rollback is necessary */ > + if (ret) { > + RTE_LOG(ERR, EAL, > + "Failed to detach device on primary process\n"); > + /* if -ENOENT, we don't need to rollback */ > + if (ret == -ENOENT) > + return ret; > + goto rollback; > + } > + > + return 0; > + > +rollback: > + req.t = EAL_DEV_REQ_TYPE_DETACH_ROLLBACK; > + > + /* primary send rollback request to secondary. */ > + if (eal_dev_hotplug_request_to_secondary(&req)) > + RTE_LOG(WARNING, EAL, > + "Failed to rollback device detach on secondary." > + "Devices in secondary may not sync with primary\n"); > + > return ret; > } > --- a/lib/librte_eal/common/eal_private.h > +++ b/lib/librte_eal/common/eal_private.h > +/** > + * Register all mp action callbacks for hotplug. > + * > + * @return > + * 0 on success, negative on error. > + */ > +int rte_dev_hotplug_mp_init(void); This function is called by the init, so it should not be private. The app should be free to build its own init routine. > --- /dev/null > +++ b/lib/librte_eal/common/hotplug_mp.h > +#include <rte_dev.h> > +#include <rte_bus.h> I think EAL headers should be included with quotes. > +/** > + * this is a synchronous wrapper for secondary process send Missing uppercase at the beggining of sentence. > + * request to primary process, this is invoked when an attach > + * or detach request issued from primary process. Missing "is" before "issued": request is issued. > --- a/lib/librte_eal/common/include/rte_dev.h > +++ b/lib/librte_eal/common/include/rte_dev.h > @@ -190,6 +190,9 @@ int rte_eal_dev_detach(struct rte_device *dev); > > /** > * Hotplug add a given device to a specific bus. > + * In multi-process, this function will inform all other processes > + * to hotplug add the same device. Any failure on other process rollback > + * the action. Better to leave a blank line before this comment. Small reword: * Hotplug add a given device to a specific bus. * * In multi-process, it will request other processes to add the same device. * A failure, in any process, will rollback the action. You should add the same comment for rte_dev_probe(). > @@ -219,6 +222,9 @@ int __rte_experimental rte_dev_probe(const char *devargs); > > /** > * Hotplug remove a given device from a specific bus. > + * In multi-process, this function will inform all other processes > + * to hotplug remove the same device. Any failure on other process > + * will rollback the action. Same reword: * Hotplug remove a given device from a specific bus. * * In multi-process, it will request other processes to remove the same device. * A failure, in any process, will rollback the action. [...] > --- a/lib/librte_eal/linuxapp/eal/eal.c > +++ b/lib/librte_eal/linuxapp/eal/eal.c > + /* register mp action callbacks for hotplug */ > + if (rte_dev_hotplug_mp_init() < 0) { In the comment, better to say "multi-process" instead of the cryptic "mp".