Hi Michele and others, I am trying to implement the approach bellow to resolve AMDGPU's hang when commands are stuck in pipe during process exit.

I noticed that once I implemented the file_operation.flush callback  then during run of X, i see the flush callback gets called not only for Xorg process but for other

processes such as 'xkbcomp' and even 'sh', it seems like Xorg passes his FDs to children, Christian mentioned he remembered a discussion to always set FD_CLOEXEC flag when opening the hardware device file, so

we suspect a bug in Xorg with regard to this behavior.

Any advise on this would be very helpful.


On 05/02/2018 07:48 AM, Christian König wrote:
I suggest the following approach:
1. Implement the flush callback and call the function to wait for the scheduler to push everything to the hardware (maybe rename the scheduler function to flush as well).

2. Change the scheduler to test for PF_EXITING, if it's set use wait_event_timeout() if it isn't set use wait_event_killable().

When the wait times out or is killed set a flag so that the _fini function knows that. Alternatively you could cleanup the _fini function to work in all cases, e.g. both when there are still jobs on the queue and when the queue is empty. For this you need to add something like a struct completion to the main loop to remove this start()/stop() of the kernel thread.


