Re: [PR] [SPARK-55620][PYTHON][CONNECT] Fix deadlock in test_connect_session d… [spark]
Yicong-Huang commented on PR #54636: URL: https://github.com/apache/spark/pull/54636#issuecomment-4009017800 Sounds good thanks. let's close the ticket. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [SPARK-55620][PYTHON][CONNECT] Fix deadlock in test_connect_session d… [spark]
gaogaotiantian commented on PR #54636: URL: https://github.com/apache/spark/pull/54636#issuecomment-4008902180 #54279 was merged last week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [SPARK-55620][PYTHON][CONNECT] Fix deadlock in test_connect_session d… [spark]
Yicong-Huang commented on PR #54636: URL: https://github.com/apache/spark/pull/54636#issuecomment-4008883926 @gaogaotiantian I reported the issue. and the hang was observed on this run. https://github.com/Yicong-Huang/spark/actions/runs/22196593939, which was 2 weeks ago and after #54279. Can you double check if this is fixed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [SPARK-55620][PYTHON][CONNECT] Fix deadlock in test_connect_session d… [spark]
godeomt commented on PR #54636: URL: https://github.com/apache/spark/pull/54636#issuecomment-4008544292 Thank you for your warm words and encouragement! I will definitely try again. Closing this PR now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [SPARK-55620][PYTHON][CONNECT] Fix deadlock in test_connect_session d… [spark]
godeomt closed pull request #54636: [SPARK-55620][PYTHON][CONNECT] Fix deadlock in test_connect_session d… URL: https://github.com/apache/spark/pull/54636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [SPARK-55620][PYTHON][CONNECT] Fix deadlock in test_connect_session d… [spark]
gaogaotiantian commented on PR #54636: URL: https://github.com/apache/spark/pull/54636#issuecomment-4008528124 @godeomt , no worries. deadlock is often a relatively difficult type of issue to find the root cause. It normally requires some expertise in the domain. It's very common that even the experienced engineers have the wrong thoughts. The reason I asked about LLM is just to confirm that this could potentially be some hallucination from AI (it looks like one). I believe the that specific flakiness issue was already resolved. Feel free to observe the CI results to confirm that. Also thank you for taking interests in improving spark. Even though this PR probably won't be merged, you'll still have a lot of chances in the future to become a contributor! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [SPARK-55620][PYTHON][CONNECT] Fix deadlock in test_connect_session d… [spark]
godeomt commented on PR #54636: URL: https://github.com/apache/spark/pull/54636#issuecomment-4008456021 Hi, thank you so much for the detailed review and for pointing out the technical flaws in my reasoning! To answer your questions honestly: - Regarding PR #54279 and reproduction: To be completely transparent, I missed PR #54279. I spent a lot of time trying to reproduce the hanging issue locally in various ways, but I couldn't get it to fail at all. Because I couldn't reproduce the bug after multiple attempts, I eventually turned to an LLM for help to understand the potential root cause based on the old CI logs. - Regarding the root cause: You are absolutely right. If Python tries to spawn a thread during finalization, it raises a RuntimeError rather than deadlocking. My assumption (and the AI's hallucination) about self._channel.close() spawning a new thread and causing an infinite hang was technically incorrect. I lacked the deep domain knowledge to properly verify the AI's theory. I really appreciate you taking the time to explain why the theory was flawed. It is a huge learning experience for me as a first-time contributor. If PR #54279 already addresses this flakiness, I am more than happy to close this PR. Thank you again for your patience and guidance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [SPARK-55620][PYTHON][CONNECT] Fix deadlock in test_connect_session d… [spark]
gaogaotiantian commented on PR #54636: URL: https://github.com/apache/spark/pull/54636#issuecomment-4007803652 I have some doubts about the root cause. First of all, did you observe any test hanging after https://github.com/apache/spark/pull/54279 ? Why do you believe `self._channel.close()` will spawn a *new* thread to clean up the ongoing gRPCs, instead of using the existing thread? If a python process is trying to spawn a new thread during finalization phase, it should raise an exception, instead of block forever. If you don't mind me asking, did you come up with the theory by yourself, or with the help of LLM? It might help with our discussion. Thank you for taking a look at the issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
