On Fri, Mar 13, 2026 at 01:59:28PM -0300, Jason Gunthorpe wrote:
> On Sat, Mar 07, 2026 at 07:38:14PM +0200, Leon Romanovsky wrote:
> > On Fri, Mar 06, 2026 at 05:47:14PM -0800, Long Li wrote:
> > > When the MANA hardware undergoes a service reset, the ETH auxiliary device
> > > (mana.eth) used by DPDK persists across the reset cycle — it is not 
> > > removed
> > > and re-added like RC/UD/GSI QPs. This means userspace RDMA consumers such
> > > as DPDK have no way of knowing that firmware handles for their PD, CQ, WQ,
> > > QP and MR resources have become stale.
> > 
> > NAK to any of this.
> > 
> > In case of hardware reset, mana_ib AUX device needs to be destroyed and
> > recreated later.
> 
> Yeah, that is our general model for any serious RAS event where the
> driver's view of resources becomes out of sync with the HW.
> 
> You have tear down the ib_device by removing the aux and then bring
> back a new one.
> 
> There is an IB_EVENT_DEVICE_FATAL, but the purpose of that event is to
> tell userspace to close and re-open their uverbs FD.
> 
> We don't have a model where a uverbs FD in userspace can continue to
> work after the device has a catasrophic RAS event.
> 
> There may be room to have a model where the ib device doesn't fully
> unplug/replug so it retains its name and things, but that is core code
> not driver stuff.

Good luck with that model. It is going to break RDMA-CM hotplug support.

Thanks

> 
> Jason
> 

Reply via email to