w1049 commented on issue #18157: URL: https://github.com/apache/tvm/issues/18157#issuecomment-3121137934
> Thanks [@w1049](https://github.com/w1049) this is very interesting! Do you mind also create a minimum example that just uses `PopenPoolExecutor`. Your suggested temp fix works, please send a PR, we should also document this case in the shutdown() function This is an example that just uses `PopenPoolExecutor`. ```python from tvm.contrib.popen_pool import PopenPoolExecutor, StatusKind import sys import gc def func(x): if x == 0: return x raise ValueError("This is a test error") while True: pool = PopenPoolExecutor() for map_result in pool.map_with_error_catching( lambda x: func(x), range(2), ): if map_result.status == StatusKind.COMPLETE: print(f"Completed with {map_result.value}") elif map_result.status == StatusKind.EXCEPTION: print(f"Exception raised: {map_result.value}") else: print(f"Unexpected status: {map_result.status}") print("Finished, trying to delete pool...") print("Ref count:", sys.getrefcount(pool)) print("Referrers:", gc.get_referrers(pool)) del pool # decrement the reference count print("After `del pool'") ``` It demostrates the following scenario in python documentation: when an exception occurs in a worker function, `del` cannot immediately delete the pool. > CPython implementation detail: It is possible for a reference cycle to prevent the reference count of an object from going to zero. In this case, the cycle will be later detected and deleted by the [cyclic garbage collector](https://docs.python.org/3/glossary.html#term-garbage-collection). A common cause of reference cycles is when an exception has been caught in a local variable. The frame’s locals then reference the exception, which references its own traceback, which references the locals of all frames caught in the traceback. Thus the conditions for this deadlock are: - an exception occurs during a build, typically due to an invalid config - in the next build, when a newly created pool is in the function `_maintain_shutdown_locks()`, GC happens to clean up the previous pool and invokes `shutdown()`. I will send a PR soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
