Hi Shahaf, On Tue, Jan 23, 2018 at 07:01:06PM +0200, Shahaf Shuler wrote: > Following commit c7bf62255edf ("net/mlx5: fix handling link status event") > the link state must be up in order for the burst function to be set on > the device ops. > > As the link may take time to move between down and up state it is > possible the rte_eth_dev_start call will return with wrong burst > function (either null or the empty burst function). > > Fixing it by forcing the link to be up before returning from device > start. In case the link is still not up after 5 seconds fail the function. > > Fixes: c7bf62255edf ("net/mlx5: fix handling link status event") > Cc: ys...@mellanox.com > > Signed-off-by: Shahaf Shuler <shah...@mellanox.com> > --- > drivers/net/mlx5/mlx5.h | 1 + > drivers/net/mlx5/mlx5_defs.h | 3 +++ > drivers/net/mlx5/mlx5_ethdev.c | 27 +++++++++++++++++++++++++++ > drivers/net/mlx5/mlx5_trigger.c | 8 +++++++- > 4 files changed, 38 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h > index a7ec607c3..30b737f76 100644 > --- a/drivers/net/mlx5/mlx5.h > +++ b/drivers/net/mlx5/mlx5.h > @@ -246,6 +246,7 @@ int mlx5_dev_configure(struct rte_eth_dev *); > void mlx5_dev_infos_get(struct rte_eth_dev *, struct rte_eth_dev_info *); > const uint32_t *mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev); > int priv_link_update(struct priv *, int); > +int priv_force_link_status_change(struct priv *, int); > int mlx5_link_update(struct rte_eth_dev *, int); > int mlx5_dev_set_mtu(struct rte_eth_dev *, uint16_t); > int mlx5_dev_get_flow_ctrl(struct rte_eth_dev *, struct rte_eth_fc_conf *); > diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h > index a71db281d..57f295c58 100644 > --- a/drivers/net/mlx5/mlx5_defs.h > +++ b/drivers/net/mlx5/mlx5_defs.h > @@ -110,4 +110,7 @@ > /* Supported RSS */ > #define MLX5_RSS_HF_MASK (~(ETH_RSS_IP | ETH_RSS_UDP | ETH_RSS_TCP)) > > +/* Maximum number of attempts to query link status before giving up. */ > +#define MLX5_MAX_LINK_QUERY_ATTEMPTS 5 > + > #endif /* RTE_PMD_MLX5_DEFS_H_ */ > diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c > index 6624888c9..523865d15 100644 > --- a/drivers/net/mlx5/mlx5_ethdev.c > +++ b/drivers/net/mlx5/mlx5_ethdev.c > @@ -966,6 +966,33 @@ priv_link_update(struct priv *priv, int wait_to_complete) > } > > /** > + * Querying the link status till it changes to the desired state. > + * Number of query attempts is bounded by MLX5_MAX_LINK_QUERY_ATTEMPTS. > + * > + * @param priv > + * Pointer to private structure. > + * @param status > + * Link desired status. > + * > + * @return > + * 0 on success, -1 on error. > + */ > +int > +priv_force_link_status_change(struct priv *priv, int status) > +{ > + int try = 0; > + > + while (try < MLX5_MAX_LINK_QUERY_ATTEMPTS) { > + priv_link_update(priv, 0); > + if (priv->dev->data->dev_link.link_status == status) > + return 0; > + try++; > + sleep(1); > + } > + return -1; > +} > + > +/** > * DPDK callback to retrieve physical link information. > * > * @param dev > diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c > index 827db2e7e..c5429e182 100644 > --- a/drivers/net/mlx5/mlx5_trigger.c > +++ b/drivers/net/mlx5/mlx5_trigger.c > @@ -166,7 +166,13 @@ mlx5_dev_start(struct rte_eth_dev *dev) > priv_xstats_init(priv); > /* Update link status and Tx/Rx callbacks for the first time. */ > memset(&dev->data->dev_link, 0, sizeof(struct rte_eth_link)); > - priv_link_update(priv, 1); > + INFO("Forcing port %u link to be up", dev->data->port_id); > + err = priv_force_link_status_change(priv, ETH_LINK_UP); > + if (err) { > + DEBUG("Failed to set port %u link to be up", > + dev->data->port_id); > + goto error; > + } > priv_dev_interrupt_handler_install(priv, dev); > priv_unlock(priv); > return 0; > -- > 2.12.0
According to mlx5_dev_start() documentation function: * @return * 0 on success, negative errno value on failure. This code is returning -1 in case of error, which means: EPERM 1 /* Operation not permitted */ which is a wrong value. Why not returning an errno in your priv function with an EBUSY or EAGAIN which is more accurate? Regards, -- Nélio Laranjeiro 6WIND