XComp commented on a change in pull request #15170:
URL: https://github.com/apache/flink/pull/15170#discussion_r594269000



##########
File path: 
flink-tests/src/test/java/org/apache/flink/test/recovery/ProcessFailureCancelingITCase.java
##########
@@ -221,15 +210,11 @@ public Long map(Long value) throws Exception {
             final Collection<JobID> jobIds = waitForRunningJobs(clusterClient, 
timeout);
 
             assertThat(jobIds, hasSize(1));
-            final JobID jobId = jobIds.iterator().next();
 
             // kill the TaskManager after the job started to run
             taskManagerProcess.destroy();
             taskManagerProcess = null;
 
-            // try to cancel the job
-            clusterClient.cancel(jobId).get();
-
             // we should see a failure within reasonable time (10s is the ask 
timeout).
             // since the CI environment is often slow, we conservatively give 
it up to 2 minutes,
             // to fail, which is much lower than the failure time given by the 
heartbeats ( > 2000s)

Review comment:
       ```suggestion
               // to fail, which is much higher than the failure time given by 
the heartbeats
   ```
   I'm puzzled by this comment: Is it correct or isn't my proposed change the 
correct version? 🤔 
   The default heartbeat timeout is set to `50s` and not explicitly set in the 
test

##########
File path: 
flink-tests/src/test/java/org/apache/flink/test/recovery/ProcessFailureCancelingITCase.java
##########
@@ -244,10 +229,14 @@ public Long map(Long value) throws Exception {
             assertTrue(error instanceof ProgramInvocationException);
             // all seems well :-)
         } catch (Exception e) {
-            printProcessLog("TaskManager", 
taskManagerProcess.getErrorOutput().toString());
+            if (taskManagerProcess != null) {
+                printProcessLog("TaskManager", 
taskManagerProcess.getErrorOutput().toString());
+            }
             throw e;
         } catch (Error e) {
-            printProcessLog("TaskManager 1", 
taskManagerProcess.getErrorOutput().toString());
+            if (taskManagerProcess != null) {
+                printProcessLog("TaskManager 1", 
taskManagerProcess.getErrorOutput().toString());

Review comment:
       Can't we merge these two catch blocks together and catch `Throwable`? I 
don't see any value in adding the ` 1` to the process name.

##########
File path: 
flink-tests/src/test/java/org/apache/flink/test/recovery/ProcessFailureCancelingITCase.java
##########
@@ -244,10 +229,14 @@ public Long map(Long value) throws Exception {
             assertTrue(error instanceof ProgramInvocationException);
             // all seems well :-)
         } catch (Exception e) {
-            printProcessLog("TaskManager", 
taskManagerProcess.getErrorOutput().toString());
+            if (taskManagerProcess != null) {
+                printProcessLog("TaskManager", 
taskManagerProcess.getErrorOutput().toString());
+            }
             throw e;
         } catch (Error e) {
-            printProcessLog("TaskManager 1", 
taskManagerProcess.getErrorOutput().toString());
+            if (taskManagerProcess != null) {
+                printProcessLog("TaskManager 1", 
taskManagerProcess.getErrorOutput().toString());

Review comment:
       Can't we merge the too `catch` blocks collecting any `Throwable` (or at 
least `Exception | Error`)? I don't see any advantage of adding the ` 1` to the 
process name in `stdout`.

##########
File path: 
flink-tests/src/test/java/org/apache/flink/test/recovery/ProcessFailureCancelingITCase.java
##########
@@ -221,15 +210,11 @@ public Long map(Long value) throws Exception {
             final Collection<JobID> jobIds = waitForRunningJobs(clusterClient, 
timeout);
 
             assertThat(jobIds, hasSize(1));
-            final JobID jobId = jobIds.iterator().next();
 
             // kill the TaskManager after the job started to run
             taskManagerProcess.destroy();
             taskManagerProcess = null;
 
-            // try to cancel the job
-            clusterClient.cancel(jobId).get();
-
             // we should see a failure within reasonable time (10s is the ask 
timeout).

Review comment:
       ```suggestion
               // we should see a failure within reasonable time (100s is the 
ask timeout).
   ```
   The ask timeout is set to 100 seconds in this test.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to