Shilun Fan created MAPREDUCE-7531:
-------------------------------------

             Summary: TestMRJobs.testThreadDumpOnTaskTimeout flaky due to 
thread dump delayed write
                 Key: MAPREDUCE-7531
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7531
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mapreduce-client
    Affects Versions: 3.5.0
            Reporter: Shilun Fan
            Assignee: Shilun Fan


*Problem*
TestMRJobs.testThreadDumpOnTaskTimeout is flaky because it checks for thread 
dump output immediately after job completion, but the thread dump may be 
written to logs asynchronously with a delay, causing false negatives.
 
*Solution*
Add polling mechanism (up to 30 seconds, retry every second) to wait for thread 
dump to appear in syslog/stdout before asserting. This ensures:
- Log file counts are correct
- Container types (AM/Map) match expectations 
- Thread dumps are actually written before validation

*Changes*
- Wrap log scanning in a retry loop with 30-second deadline
- Accumulate container counts and thread dump flags in each iteration
- Move all assertions after polling completes to avoid race conditions
- Fix @Timeout values for related tests (3000→300 seconds)
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to