>Tung, Chien Tin wrote: >>> In other words, I think we want the ucma context to stay around until >>> the application destroys it (via explicit means or via exit). But the >>> rdma_cm_id gets destroyed immediately upon receiving a DEVICE_REMOVE event. >>> >> >> How do we "take care" of evil applications that won't go away? >> >> > >Since the low level rdma_cm_id _is_ destroyed, then the RDMA device can >unload and go away. The evil app then only is wasting the ucma >contexts, file descriptors, etc.
Yup, let someone else hold the bag... >Now, we could also consider something more abrupt like delivering a >SIGABRT or SIGBUS to all processes that have objects allocated for the >device that is going away. But that is kind of drastic, and if done >unconditionally, will kill applications that want to process the device >removal and free the objects using that device, but still want to >continue running on the available devices. So I wouldn't recommend we >deliver fatal signals... I'm against violence as well. But to Roland's point, how will we ummap resources? If an application won't respond to device removal event and clean up properly, perhaps it is "okay" to let it crash. Alternatively, what about a RDMA_CM_EVENT_DEVICE_REMOVAL_PENDING and RDMA_CM_EVENT_DEVICE_REMOVED scheme. Post the first event to allow good applications to clean up. The second event to notify apps that the device is "gone". After the second event, we can then get violent and shoot to kill? Chien -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
