XComp commented on a change in pull request #15170:
URL: https://github.com/apache/flink/pull/15170#discussion_r594269000
##########
File path:
flink-tests/src/test/java/org/apache/flink/test/recovery/ProcessFailureCancelingITCase.java
##########
@@ -221,15 +210,11 @@ public Long map(Long value) throws Exception {
final Collection<JobID> jobIds = waitForRunningJobs(clusterClient,
timeout);
assertThat(jobIds, hasSize(1));
- final JobID jobId = jobIds.iterator().next();
// kill the TaskManager after the job started to run
taskManagerProcess.destroy();
taskManagerProcess = null;
- // try to cancel the job
- clusterClient.cancel(jobId).get();
-
// we should see a failure within reasonable time (10s is the ask
timeout).
// since the CI environment is often slow, we conservatively give
it up to 2 minutes,
// to fail, which is much lower than the failure time given by the
heartbeats ( > 2000s)
Review comment:
```suggestion
// to fail, which is much higher than the failure time given by
the heartbeats
```
I'm puzzled by this comment: Is it correct or isn't my proposed change the
correct version? 🤔
The default heartbeat timeout is set to `50s` and not explicitly set in the
test
##########
File path:
flink-tests/src/test/java/org/apache/flink/test/recovery/ProcessFailureCancelingITCase.java
##########
@@ -244,10 +229,14 @@ public Long map(Long value) throws Exception {
assertTrue(error instanceof ProgramInvocationException);
// all seems well :-)
} catch (Exception e) {
- printProcessLog("TaskManager",
taskManagerProcess.getErrorOutput().toString());
+ if (taskManagerProcess != null) {
+ printProcessLog("TaskManager",
taskManagerProcess.getErrorOutput().toString());
+ }
throw e;
} catch (Error e) {
- printProcessLog("TaskManager 1",
taskManagerProcess.getErrorOutput().toString());
+ if (taskManagerProcess != null) {
+ printProcessLog("TaskManager 1",
taskManagerProcess.getErrorOutput().toString());
Review comment:
Can't we merge these two catch blocks together and catch `Throwable`? I
don't see any value in adding the ` 1` to the process name.
##########
File path:
flink-tests/src/test/java/org/apache/flink/test/recovery/ProcessFailureCancelingITCase.java
##########
@@ -244,10 +229,14 @@ public Long map(Long value) throws Exception {
assertTrue(error instanceof ProgramInvocationException);
// all seems well :-)
} catch (Exception e) {
- printProcessLog("TaskManager",
taskManagerProcess.getErrorOutput().toString());
+ if (taskManagerProcess != null) {
+ printProcessLog("TaskManager",
taskManagerProcess.getErrorOutput().toString());
+ }
throw e;
} catch (Error e) {
- printProcessLog("TaskManager 1",
taskManagerProcess.getErrorOutput().toString());
+ if (taskManagerProcess != null) {
+ printProcessLog("TaskManager 1",
taskManagerProcess.getErrorOutput().toString());
Review comment:
Can't we merge the too `catch` blocks collecting any `Throwable` (or at
least `Exception | Error`)? I don't see any advantage of adding the ` 1` to the
process name in `stdout`.
##########
File path:
flink-tests/src/test/java/org/apache/flink/test/recovery/ProcessFailureCancelingITCase.java
##########
@@ -221,15 +210,11 @@ public Long map(Long value) throws Exception {
final Collection<JobID> jobIds = waitForRunningJobs(clusterClient,
timeout);
assertThat(jobIds, hasSize(1));
- final JobID jobId = jobIds.iterator().next();
// kill the TaskManager after the job started to run
taskManagerProcess.destroy();
taskManagerProcess = null;
- // try to cancel the job
- clusterClient.cancel(jobId).get();
-
// we should see a failure within reasonable time (10s is the ask
timeout).
Review comment:
```suggestion
// we should see a failure within reasonable time (100s is the
ask timeout).
```
The ask timeout is set to 100 seconds in this test.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]