pzhdfy opened a new pull request, #55302: URL: https://github.com/apache/doris/pull/55302
### What problem does this PR solve? Related PR: https://github.com/apache/doris/pull/30082 Problem Summary: We are using doris 3.0.4 with java udf, when running long time, some BE may got jvm deadlocks with UNKNOWN_owner_addr [deadlock in udf close] <img width="2314" height="882" alt="image" src="https://github.com/user-attachments/assets/c16b5d18-9900-45dc-8bb4-894180694a44" /> [deadlock in throwableToStackTrace, waiting a StringWriter , but this is impossible , because StringWriter was newed everytime] <img width="1724" height="720" alt="image" src="https://github.com/user-attachments/assets/b586f752-0ba8-43cc-89b4-352a7d9ab00c" /> <img width="618" height="144" alt="image" src="https://github.com/user-attachments/assets/20c2ad11-beef-4b0c-b2a2-343d9b0fd14c" /> In be.WARNING we found get JNIEnv failed when close udf <img width="1659" height="426" alt="image" src="https://github.com/user-attachments/assets/6cf54aa2-522a-407e-a53a-e17fdbc9162e" /> In be.out we found libhdfs error log [JNIEnv is got by libhdfs when in X86 ] Call to AttachCurrentThread failed with error: -1 getJNIEnv: getGlobalJNIEnv failed In be.WARNING there are some other err when using GetJniExceptionMsg <img width="1686" height="530" alt="image" src="https://github.com/user-attachments/assets/35ca7b4c-0a9d-4da3-8be2-9a07ccebb74b" /> we found all those err has common ground 1.all case occured in JavaFunctionCall::close 2.the stack all have bthread keyword after seaching the web , we found that JNI is not compatible with bthread https://blog.csdn.net/qq_46104835/article/details/139360911 https://gitee.com/baidu/BRPC/blob/master/docs/cn/server.md#pthread%E6%A8%A1%E5%BC%8F <img width="1101" height="211" alt="image" src="https://github.com/user-attachments/assets/e1da2f9a-9e37-47d9-b0a1-c1652da93e75" /> Then we switch bthread to pthread mode , every thing works fine. We want to know how often bhread do the JavaFunctionCall::close, then we add metrics. only 1/10000 JavaFunctionCall::close running in bthread <img width="1091" height="205" alt="image" src="https://github.com/user-attachments/assets/2f7d996c-c2bb-425d-9a07-92f0920fe018" /> But why JavaFunctionCall::close occured in bthread[ after https://github.com/apache/doris/issues/16634, exec_plan_fragment is running in pthread instead of bthread] then we found a pr https://github.com/apache/doris/pull/30082 ExchangeSinkBuffer<Parent>::_send_rpc will set a send_callback with a weak_task_ctx <img width="1392" height="254" alt="image" src="https://github.com/user-attachments/assets/a1d28f18-c3c7-4953-baab-d1109f341aa1" /> So sometimes send_callback may using weak_task_ctx.lock() to get a shared_ptr to task_ctx, then task_ctx destructor my be called in send_callback[ send_callback is running in bthread] So We modify JavaFunctionCall::close when JavaFunctionCall::close running in bthread, we submit the jni operation to a pthread pool, and wait it finish because only 1/10000 JavaFunctionCall::close are running in bthread. the pthread pool size can set to a small number. ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: https://github.com/apache/doris-website/pull/1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
