teamconfx commented on code in PR #27462:
URL: https://github.com/apache/flink/pull/27462#discussion_r2723443212
##########
flink-runtime/src/test/java/org/apache/flink/runtime/testutils/CommonTestUtils.java:
##########
@@ -70,6 +71,9 @@ public class CommonTestUtils {
private static final long RETRY_INTERVAL = 100L;
+ /** Default timeout for waiting on tasks to reach running state. */
+ public static final Duration DEFAULT_WAIT_FOR_TASKS_TIMEOUT =
Duration.ofMinutes(5);
Review Comment:
I think a timeout would be as a good "guard" for limit the unit test timing
(especially when there would have infinite dead waiting).
For this specific case, I found that if I setup a condition (e.g., node
restart) in some Flink uni tests, then the job would "loss" and the waiting
would hang forever, as I described in the JIRA.
One concrete example I can show you is that in this unit test:
[org.apache.flink.test.streaming.runtime.SinkMetricsITCase](https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/SinkMetricsITCase.java#L111),
if you inject a node restart (restart the taskmanager) between line 110~line
111, then this corrupted `jobID` will hang the unit test forever.
I set to 5 minutes as I would think 5 minutes should be a good timeout value
for a unit test, as I see most unit tests in Flink suite can finish under this
window.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]