westonpace commented on PR #13691: URL: https://github.com/apache/arrow/pull/13691#issuecomment-1194233235
I investigated the failing ASAN run further today. I'm able to reproduce it on master so it is unrelated to this change. The root problem is the same one that motivates ARROW-16072. From the JIRA: > We've hit a bit of a wall with the merged generator. The current behavior is: If one subscription encounters an error we simply stop pulling from the other subscriptions. Once everything has settled down we return the error and end the stream. > > In reality, we should be sending some kind of cancel signal down to the other generators. Otherwise, we are not respecting the rule for AsyncGenerator that we recently defined which is "An AsyncGenerator should always be fully consumed". The end result of this is that some bits of the scanner keep running in the event of a failure or an abort. These pieces keep their state alive via shared_ptr so it doesn't actually cause any exceptions. However, it can lead to cases like this memory leak. The last test in ScannerTest will test a failed scan. This causes some bits to be left running. Then the test ends and the process shuts down and the thread pool shuts down immediately. This causes some cleanup thread tasks to fail to get scheduled or not run. As a result there is some leakage at shutdown. This is only a shutdown leak and so pretty minor. ARROW-16072 will tie cancellation into the scanner so that once a plan is aborted and the user waits for it to finish then nothing will be left running and there will be no possibility for this. Long story short, I think we can proceed with merging this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
