MehulBatra opened a new pull request, #1280:
URL: https://github.com/apache/fluss/pull/1280

   Purpose
   <!-- Linking this pull request to the issue -->
   Linked issue: close #1031 
   <!-- What is the purpose of the change -->
   Add watchdog monitoring for Flink tests to capture debugging information 
when CI builds timeout, helping identify root causes of stuck test processes.
   Brief change log
   <!-- Please describe the changes made in this pull request and explain how 
they address the issue -->
   
   - Added tools/ci/flink_watchdog.sh: New watchdog script that monitors Flink 
test execution and captures thread dumps when tests approach timeout limits
   
   - Modified .github/workflows/ci.yaml: Updated CI workflow to use watchdog 
specifically for Flink module tests (matrix.module == 'flink')
   
   - Enhanced debugging capabilities: Watchdog captures Java process 
information and thread dumps (jstack) at 95% of timeout and just before process 
termination
   
   - Artifact collection: Added new CI step to upload debug artifacts (thread 
dumps, execution logs) when Flink tests fail due to timeout
   
   - Zero impact on core tests: Core module tests continue to run without any 
watchdog overhead
   
   This introduces a new CI debugging feature for Flink tests. Key aspects:
   
   Debug artifacts: When Flink tests timeout, three files are created and 
uploaded:
   
   1. test-output: Complete timestamped execution log
   2. jps-traces.0: Thread dump captured at 95% of timeout (shows stuck 
processes)
   3. jps-traces.1: Final thread dump before process termination
   
   
   Artifact download: Failed Flink CI runs will have downloadable 
flink-debug-{run-number} artifacts containing all debugging information
   
   Tests
   <!-- List UT and IT cases to verify this change -->
   
   - Local testing: Watchdog script tested locally with simulated timeout 
scenarios using sleep commands
   
   
   API and Format
   <!-- Does this change affect API or storage format -->
   None
   
   Documentation
   <!-- Does this change introduce a new feature -->
   None
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to