[ 
https://issues.apache.org/jira/browse/BEAM-6170?focusedWorklogId=176465&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-176465
 ]

ASF GitHub Bot logged work on BEAM-6170:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Dec/18 10:36
            Start Date: 18/Dec/18 10:36
    Worklog Time Spent: 10m 
      Work Description: echauchot closed pull request #7191: [BEAM-6170] Change 
Nexmark stuckness warnings to not fail pipeline
URL: https://github.com/apache/beam/pull/7191
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkLauncher.java
 
b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkLauncher.java
index c797064b4b5a..18c874195d06 100644
--- 
a/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkLauncher.java
+++ 
b/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkLauncher.java
@@ -122,13 +122,13 @@
    * activity.
    */
   private static final Duration DONE_DELAY = Duration.standardMinutes(1);
-  /** How long to allow no activity without warning. */
+  /** How long to allow no activity at sources and sinks without warning. */
   private static final Duration STUCK_WARNING_DELAY = 
Duration.standardMinutes(10);
   /**
-   * How long to let streaming pipeline run after we've seen no activity, even 
if all events have
-   * not been generated.
+   * How long to let streaming pipeline run after we've seen no activity at 
sources or sinks, even
+   * if all events have not been generated.
    */
-  private static final Duration STUCK_TERMINATE_DELAY = 
Duration.standardDays(3);
+  private static final Duration STUCK_TERMINATE_DELAY = 
Duration.standardHours(1);
 
   /** NexmarkOptions for this run. */
   private final OptionT options;
@@ -456,7 +456,7 @@ private NexmarkPerf monitor(NexmarkQuery query) {
         }
 
         if (fatalCount > 0) {
-          NexmarkUtils.console("job has fatal errors, cancelling.");
+          NexmarkUtils.console("ERROR: job has fatal errors, cancelling.");
           errors.add(String.format("Pipeline reported %s fatal errors", 
fatalCount));
           waitingForShutdown = true;
           cancelJob = true;
@@ -468,16 +468,19 @@ private NexmarkPerf monitor(NexmarkQuery query) {
           NexmarkUtils.console("streaming query appears to have finished 
waiting for completion.");
           waitingForShutdown = true;
         } else if (quietFor.isLongerThan(STUCK_TERMINATE_DELAY)) {
-          NexmarkUtils.console("streaming query appears to have gotten stuck, 
cancelling job.");
-          errors.add("Cancelling streaming job since it appeared stuck");
+          NexmarkUtils.console(
+              "ERROR: streaming query appears to have been stuck for %d 
minutes, cancelling job.",
+              quietFor.getStandardMinutes());
+          errors.add(
+              String.format(
+                  "Cancelling streaming job since it appeared stuck for %d 
min.",
+                  quietFor.getStandardMinutes()));
           waitingForShutdown = true;
           cancelJob = true;
         } else if (quietFor.isLongerThan(STUCK_WARNING_DELAY)) {
           NexmarkUtils.console(
               "WARNING: streaming query appears to have been stuck for %d 
min.",
               quietFor.getStandardMinutes());
-          errors.add(
-              String.format("Streaming query was stuck for %d min", 
quietFor.getStandardMinutes()));
         }
 
         if (cancelJob) {


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 176465)
    Time Spent: 1.5h  (was: 1h 20m)

> NexmarkLauncher stall warning causes benchmark failure
> ------------------------------------------------------
>
>                 Key: BEAM-6170
>                 URL: https://issues.apache.org/jira/browse/BEAM-6170
>             Project: Beam
>          Issue Type: Bug
>          Components: testing
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: Trivial
>   Original Estimate: 10m
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The monitoring in the NexmarkLauncher watches the pipeline results and source 
> for events.  If no events are consumed from the source or published to the 
> output for over
> STUCK_WARNING_DELAY it prints an warning message:
>  "WARNING: streaming query appears to have been stuck for %d min.",
> However it also adds this to the errors list, which causes the pipeline to 
> terminate by throwing an exception.  There is a separate 
> STUCK_TERMINATE_DELAY which is used to terminate the pipeline.  It seems 
> inconsistent to cause the pipeline to fail with the warning timeout.  It is 
> also not 100% accurrate because the pipeline may be busy processing stages 
> other than the input or output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to