MartijnVisser opened a new pull request, #28636:
URL: https://github.com/apache/flink/pull/28636

   ## What is the purpose of the change
   
   Fixes flaky `DynamicParameterITCase` runs that either hang the whole e2e_4 
leg for hours or fail with `Missing required option: c` (Azure builds 75992, 
76627). The distribution's log4j configuration rolls the log file on startup 
(`OnStartupTriggeringPolicy`), so the JobManager startup banner (program 
arguments, classpath) frequently lands in a rolled `.log.N` file, which 
`FlinkDistribution.searchAllLogs` deliberately skips. The test then either 
spins unboundedly waiting for a banner that never appears in the live `.log`, 
or parses a half-written arguments block.
   
   ## Brief change log
   
     - `FlinkDistribution.searchAllLogs`: add an overload with an 
`includeRolledLogs` flag; the existing 2-arg method delegates with `false`, so 
all other callers are unchanged.
     - `DynamicParameterITCase`: search rolled logs for the startup banner, and 
bound the readiness wait with `CommonTestUtils.waitUtil` (1 minute) so a 
missing banner fails fast with a clear message instead of hanging until the CI 
watchdog kills the leg. The "Classpath:" line is logged after the program 
arguments, so its presence guarantees the complete block has been flushed 
before parsing.
   
   ## Verifying this change
   
   This change is already covered by existing tests (`DynamicParameterITCase`, 
all parameterizations green locally). The hang needs the on-startup log 
rotation to move the banner into a rolled file, which depends on prior runs' 
log state on the CI machine and is not deterministically reproducible locally.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes (Claude Opus 4.8, via Claude Code)
   
   Generated-by: Claude Opus 4.8 (1M context)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to