MehulBatra opened a new pull request, #1280:
URL: https://github.com/apache/fluss/pull/1280
Purpose
<!-- Linking this pull request to the issue -->
Linked issue: close #1031
<!-- What is the purpose of the change -->
Add watchdog monitoring for Flink tests to capture debugging information
when CI builds timeout, helping identify root causes of stuck test processes.
Brief change log
<!-- Please describe the changes made in this pull request and explain how
they address the issue -->
- Added tools/ci/flink_watchdog.sh: New watchdog script that monitors Flink
test execution and captures thread dumps when tests approach timeout limits
- Modified .github/workflows/ci.yaml: Updated CI workflow to use watchdog
specifically for Flink module tests (matrix.module == 'flink')
- Enhanced debugging capabilities: Watchdog captures Java process
information and thread dumps (jstack) at 95% of timeout and just before process
termination
- Artifact collection: Added new CI step to upload debug artifacts (thread
dumps, execution logs) when Flink tests fail due to timeout
- Zero impact on core tests: Core module tests continue to run without any
watchdog overhead
This introduces a new CI debugging feature for Flink tests. Key aspects:
Debug artifacts: When Flink tests timeout, three files are created and
uploaded:
1. test-output: Complete timestamped execution log
2. jps-traces.0: Thread dump captured at 95% of timeout (shows stuck
processes)
3. jps-traces.1: Final thread dump before process termination
Artifact download: Failed Flink CI runs will have downloadable
flink-debug-{run-number} artifacts containing all debugging information
Tests
<!-- List UT and IT cases to verify this change -->
- Local testing: Watchdog script tested locally with simulated timeout
scenarios using sleep commands
API and Format
<!-- Does this change affect API or storage format -->
None
Documentation
<!-- Does this change introduce a new feature -->
None
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]