liyude-tw commented on code in PR #26809: URL: https://github.com/apache/flink/pull/26809#discussion_r2217918195
########## flink-yarn/src/main/java/org/apache/flink/yarn/configuration/YarnConfigOptions.java: ########## @@ -110,16 +110,16 @@ public class YarnConfigOptions { public static final ConfigOption<Long> APPLICATION_ATTEMPT_FAILURE_VALIDITY_INTERVAL = key("yarn.application-attempt-failures-validity-interval") .longType() - .defaultValue(10000L) + .defaultValue(-1L) Review Comment: Below is the reasoning that led me to propose -1 and how I believe the change is safer and less surprising than the current default. 1. Few users intentionally depend on the current 10 s window The 10 s sliding window was introduced in PR #8400 by re-using the then-default Akka timeout. It wasn’t added to satisfy a concrete production need, so I think almost no one relies on it on purpose. We discover it after being surprised by extra restarts. 2. Hadoop YARN’s own default is -1 (global counting) Because Flink runs as a YARN ApplicationMaster, aligning with the upstream default reduces the cognitive overhead for operators who administer both systems. 3. The documentation and common intuition both imply “global counting” The description of yarn.application-attempts naturally suggests a total attempt limit. A hidden time window can therefore be surprising. ### Risk-mitigation proposal 1. Upgrade guide Add the following note in the upgrade section for this release: > Starting with this release, yarn.application-attempt-failures-validity-interval defaults to -1 (global counting). > Clusters that benefit from the previous 10 s sliding window can retain the old behaviour by adding > `yarn.application-attempt-failures-validity-interval: 10000` 2. Release notes Repeat the same notice and example so that operators can quickly restore the former setting if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org