> > 09/10/2025 12:30, Loftus, Ciara: > > > > > > 03/10/2025 12:29, Loftus, Ciara: > > > > > > > Introduce a function that allows a VF to request the PF to reset > itself. > > > > > > > This is useful for example when the application detects that one > > > > > > > of > the > > > > > > > queues have hung or any event where a reset is required and the PF > is > > > > > > > unlikely to trigger it. > > > > > > > > > > [...] > > > > > > > > > > > > In general, I'm not a huge fan of adding driver-specific functions > > > > > > and I > > > > > > feel like this should fit under the existing reset APIs in some > > > > > > way. That > > > > > > should avoid the need to update (or such a big update) to testpmd, > for > > > > > > example. > > > > > > Some thoughts here: > > > > > > > > > > > > 1. we could add a devarg to the driver to adjust whether reset does > > > > > > a > > > > > > "softer" reset of the VF just resetting itself, or a "hard" > > > > > > reset where > the > > > > > > PF does a fuller reset of the VF. > > > > > > 2. rather than a devarg, would do use a driver-specific function to > adjust > > > > > > this behaviour. The difference here would be that the > > > > > > driver-specific > > > > > > function would be an init-time one rather than runtime, so the > runtime > > > of > > > > > > the app, like testpmd, would be generic. > > > > > > 3. the most generic solution would be to add an additional parameter > to > > > the > > > > > > reset() function itself to specify a hard or soft reset. This > > > > > > would > mean > > > > > > updating all drivers to handle the new parameter (shouldn't be > > > > > > hard, > > > since > > > > > > it would be __rte_unused in all cases by default). This also > > > > > > opens up > > > > > > the possibility of other drivers - especially VFs - using it in > > > > > > the same > > > > > > way. We could actually document that the "hard" option "may be > used > > > by > > > > > VF > > > > > > drivers to request a full reset of the VF by the PF". > > > > > > > > > > Hi Bruce, > > > > > > > > > > I did not make it clear in my commit messages the full purpose of this > new > > > API. > > > > > Along with resetting the VF, the VF is also reconfigured and restarted > > > > > transparently, using the existing iavf_handle_hw_reset implementation. > > > > > While there is probably merit in extending the reset API to include a > > > soft/hard > > > > > reset flag, I would still need this new API to fulfil the purpose > > > > > described > > > above. > > > > > If we wanted to make this generic I see two options: > > > > > 1. extend the reset API to optionally reconfigure and restart > > > > > 2. introduce a new generic API that performs a reset followed by a > > > reconfigure > > > > > and restart. > > > > > > > > I've submitted a v2 and renamed the function to "restore" instead of > "reset" > > > to > > > > better describe the behaviour of the function - not just resetting, but > > > > also > > > > restoring configuration afterwards and restarting the device. > > > > > > > > I agree that it would be best to avoid a driver-specific function like > > > > this. > After > > > some > > > > thought I don't think we can extend the reset API to include this > behaviour, > > > the > > > > reset API should probably leave the device in a reset state. > > > > > > > > I'm wondering if the community would be in favour of creating a new > ethdev > > > API > > > > for restoring/reinitialising a device. It would essentially comprise of > > > > reset, > > > > reconfigure and queue setup, and optionally device start? > > > > > > It is confusing to add an API if it is a perfect overlap > > > of what exists already. > > > If you want to add a new API, you need to explain how it is different > > > of a combination of reset/configure/start. > > > > Apologies for any confusion. > > The new API would perform only the necessary reconfiguration > > and setup steps under-the-hood and use the existing device > > configuration. This would differ from a combination of > > reset/reconfigure/start in two ways: > > 1. In some cases (at least in the case of the iavf driver), a > > full rte_eth_dev_configure is not required after an rte_eth_dev_reset. > > So the new API could just call the device callbacks for those functions, > > rather than forcing the user to call the ethdev functions and perform > > some unnecessary reconfiguration steps. > > 2. It takes the burden off the user to build up the device configuration > > again after the reset. Any pertinent configuration can be stored before > > the reset and used to reconfigure the device transparently. All the user > > needs to supply is the port id. > > > > Anyway, I've submitted a v3 of the non-generic version of this for the > > iavf driver, so if there is no appetite for the generic approach I'll > > continue to pursue the driver-specific one. > > A generic API is way better. > > I'm trying to understand what is missing in the current API. > > I think what you want to achieve is > RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE. > What is missing exactly to implement it with the generic API? >
Thanks for the recommendation, I think there is probably a place in the iavf driver for RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE when we detect a PF-to-VF reset. However I don't think it lends itself to this problem. What is missing is the trigger for the sequence to start. RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE seems to be concerned with recovering from hardware errors detected by the PMD. There doesn't seem to be a way to ask the PMD to enter the recovery mode, you must wait for a detected error - please correct me if I am wrong. What I am looking to implement is an API that will allow the user to trigger the PMD to perform a reset followed by recovery (configure, setup and device start). Yes this can be done using existing APIs: rte_eth_dev_reset rte_eth_dev_configure rte_eth_rx_queue_setup rte_eth_tx_queue_setup rte_eth_dev_start But that involves potentially unnecessary reconfiguration and more burden on the user in building up the device conf, versus an API that just accepts the port_id. The user may want to tear down the device and bring it straight back up again in cases where something has gone wrong eg. a queue appears to be hung, and resetting it may help resolve the issue.

