Shilun Fan created MAPREDUCE-7531:
-------------------------------------
Summary: TestMRJobs.testThreadDumpOnTaskTimeout flaky due to
thread dump delayed write
Key: MAPREDUCE-7531
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7531
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mapreduce-client
Affects Versions: 3.5.0
Reporter: Shilun Fan
Assignee: Shilun Fan
*Problem*
TestMRJobs.testThreadDumpOnTaskTimeout is flaky because it checks for thread
dump output immediately after job completion, but the thread dump may be
written to logs asynchronously with a delay, causing false negatives.
*Solution*
Add polling mechanism (up to 30 seconds, retry every second) to wait for thread
dump to appear in syslog/stdout before asserting. This ensures:
- Log file counts are correct
- Container types (AM/Map) match expectations
- Thread dumps are actually written before validation
*Changes*
- Wrap log scanning in a retry loop with 30-second deadline
- Accumulate container counts and thread dump flags in each iteration
- Move all assertions after polling completes to avoid race conditions
- Fix @Timeout values for related tests (3000→300 seconds)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]