On Fri, Mar 13, 2026 at 01:59:28PM -0300, Jason Gunthorpe wrote: > On Sat, Mar 07, 2026 at 07:38:14PM +0200, Leon Romanovsky wrote: > > On Fri, Mar 06, 2026 at 05:47:14PM -0800, Long Li wrote: > > > When the MANA hardware undergoes a service reset, the ETH auxiliary device > > > (mana.eth) used by DPDK persists across the reset cycle — it is not > > > removed > > > and re-added like RC/UD/GSI QPs. This means userspace RDMA consumers such > > > as DPDK have no way of knowing that firmware handles for their PD, CQ, WQ, > > > QP and MR resources have become stale. > > > > NAK to any of this. > > > > In case of hardware reset, mana_ib AUX device needs to be destroyed and > > recreated later. > > Yeah, that is our general model for any serious RAS event where the > driver's view of resources becomes out of sync with the HW. > > You have tear down the ib_device by removing the aux and then bring > back a new one. > > There is an IB_EVENT_DEVICE_FATAL, but the purpose of that event is to > tell userspace to close and re-open their uverbs FD. > > We don't have a model where a uverbs FD in userspace can continue to > work after the device has a catasrophic RAS event. > > There may be room to have a model where the ib device doesn't fully > unplug/replug so it retains its name and things, but that is core code > not driver stuff.
Good luck with that model. It is going to break RDMA-CM hotplug support. Thanks > > Jason >

