hudi-bot opened a new issue, #16221: URL: https://github.com/apache/hudi/issues/16221
## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-6820 - Type: Bug - Epic: https://issues.apache.org/jira/browse/HUDI-4302 --- ## Comments 06/Sep/23 17:17;ljain;Siva had tried disabling tests to see if they were causing timeouts via PRs - [https://github.com/apache/hudi/pull/9543] [https://github.com/apache/hudi/pull/9550/files] [https://github.com/apache/hudi/pull/9542] [https://github.com/apache/hudi/pull/9551] But the timeout issue was still visible.;;; --- 06/Sep/23 17:17;ljain;For some runs like - [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=19384&view=logs&j=ba200224-5437-5e21-2643-114ac65587f4] attempt 2 and 3 here, the timeout occurs after we see 87 tests have run. For some of the others like [https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/19500/logs/7], timeout occurs after we see 15 tests have run.;;; --- 06/Sep/23 17:22;ljain;Some older runs where similar issue can be seen: FT client/spark-client: [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=18864&view=logs&j=7601efb9-4019-552e-11ba-eb31b66593b2|https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=18152&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0] UT FT other modules [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=18152&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0];;; --- 06/Sep/23 17:26;ljain;Usually the error message is of the form: {code:java} ,##[error]We stopped hearing from agent Hosted Agent. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610 {code} Or of the form: {code:java} The job running on agent Azure Pipelines 7 ran longer than the maximum time of 150 minutes. {code} It could be an actual timeout where tests were running for 150 minutes. But in many cases issue we are seeing is that raw logs show {code:java} 2023-07-28T06:01:14.7898970Z at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_372] 2023-07-28T06:01:14.7899483Z at org.apache.hudi.metrics.Metrics.shutdown(Metrics.java:116) ~[hudi-client-common-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT] 2023-07-28T06:01:14.7899862Z at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_372] 2023-07-28T07:27:41.0195722Z ##[error]The operation was canceled. 2023-07-28T07:27:41.0242402Z ##[section]Finishing: FT client/spark-client {code} There is a gap of 1 hr and more between last test run and the operation cancellation.;;; --- 06/Sep/23 17:29;ljain;Found a microsoft developer ticket for similar issue where Azure CI is getting timeout. [https://developercommunity.visualstudio.com/t/errorthe-operation-was-canceled-azure-devops-build/692048?viewtype=all];;; --- 06/Sep/23 17:30;ljain;Created [https://github.com/apache/hudi/pull/962|https://github.com/apache/hudi/pull/9627]7 and [https://github.com/apache/hudi/pull/9628] with disabled {{TestHoodieRealtimeRecordReader}} . UT FT timed out after 4 hours in both the PRs above. Also FT client/spark-client also timed out in the CI run [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=19671&view=logs&j=7601efb9-4019-552e-11ba-eb31b66593b2&t=9688f101-287d-53f4-2a80-87202516f5d0];;; --- 06/Sep/23 17:43;ljain;Created [https://github.com/apache/hudi/pull/9604] , PRs 9605-9609, [https://github.com/apache/hudi/pull/9632] and [https://github.com/apache/hudi/pull/9633] for testing Azure CI on 0.13.0 hudi branch.;;; --- 07/Sep/23 13:19;ljain;The maven version is 3.8.8 and has not changed since June. and we don't have azure runs before that.;;; --- 07/Sep/23 18:21;ljain;First known occurrence of timeout issue (12th June) - [https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=17760&view=results];;; --- 08/Sep/23 16:50;ljain;PRs [https://github.com/apache/hudi/pull/9604] to 9607 had been run on 0.13.0 branch. Out of total 8 PRs, 4 failed with timeout and 4 others were cancelled because of other critical PRs. The issue is reproducible with 0.13.0.;;; --- 08/Sep/23 16:53;ljain;Created a PR to move hudi-hadoop-mr and hudi-java-client to github action. [https://github.com/apache/hudi/pull/9661];;; --- 08/Sep/23 16:53;ljain;Created a PR which disables deltastreamer continuous mode tests. [https://github.com/apache/hudi/pull/9662];;; --- 08/Sep/23 17:02;ljain;Also created PRs [https://github.com/apache/hudi/pull/9654/files] to 9657 for moving hudi-hadoop-mr and hudi-java-client to a separate job. Timeout errors are still seen in the UT FT jobs.;;; --- 11/Sep/23 17:21;ljain;Created PRs https://github.com/apache/hudi/pull/9676 to 9678 after disabling TestHoodieCombineHiveInputFormat. Was seeing flakiness in github check for separated hudi-mr and java-client module. Out of 3 prs, 2 passed but 1 will timeout.;;; --- 11/Sep/23 17:42;ljain;Created [https://github.com/apache/hudi/pull/9682] to add 40 minutes timeout for hudi-hadoop-mr check. The github check takes 6 hours to timeout currently.;;; --- 11/Sep/23 17:48;ljain;Created [https://github.com/apache/hudi/pull/9683] after enabling debug logs for maven. Only the newly added github check would run here. Should ideally print out more logs.;;; --- 12/Sep/23 16:15;ljain;| Created PRs [https://github.com/apache/hudi/pull/9676] to 9678 after disabling TestHoodieCombineHiveInputFormat Usually I see timeout in hudi-java-client after disabling this test. Otherwise we see timeout in both hadoop-mr and java-client.;;; --- 12/Sep/23 16:16;ljain;After enabling more debug logs, created a few PRs. https://github.com/apache/hudi/actions/runs/6158416987/job/16711145162?pr=9690 . I can see that all annotations with @After complete before timeout;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
