[ 
https://issues.apache.org/jira/browse/BEAM-14157?focusedWorklogId=752365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752365
 ]

ASF GitHub Bot logged work on BEAM-14157:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Apr/22 16:46
            Start Date: 04/Apr/22 16:46
    Worklog Time Spent: 10m 
      Work Description: TheNeuralBit commented on code in PR #17191:
URL: https://github.com/apache/beam/pull/17191#discussion_r841946902


##########
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServerTest.java:
##########
@@ -695,8 +705,24 @@ public void onCompleted() {
     }
     stream.flush();
     stream.close();
+    isClientClosed.set(true);
 
-    Thread.sleep(100);
+    long deadline = System.currentTimeMillis() + 60_000; // 1 min
+    while (true) {
+      Thread.sleep(100);
+      int tmpErrorsAfterClose = errorsAfterClose.get();
+      int tmpErrorsBeforeClose = errorsBeforeClose.get();
+      // wait for at least 2 errors before and after
+      if (tmpErrorsAfterClose > 1 && tmpErrorsBeforeClose > 1) {
+        break;
+      }
+      if (System.currentTimeMillis() > deadline) {
+        fail(

Review Comment:
   Sorry this is a bit of nit, but this check is actually verifying the test 
code itself, right? That is, if we get here there's likely a bug in the test 
code, not in the implementation? If so, it would be nice to clarify this in a 
comment in case this ever starts flaking.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 752365)
    Time Spent: 3.5h  (was: 3h 20m)

> Don't send requests on a closed windmill Grpc streams
> -----------------------------------------------------
>
>                 Key: BEAM-14157
>                 URL: https://issues.apache.org/jira/browse/BEAM-14157
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Arun Pandian
>            Assignee: Arun Pandian
>            Priority: P2
>          Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> GrpcWindmillServer could send requests requests on client closed streams. 
> This leads to windmill streams getting stalling occasionally for few seconds 
> to few minutes. grpc-java doc says not to call onNext to send after a stream 
> is client closed. 
> When the streams get stalled it is logged as "Output channel stalled for {}s, 
> outbound thread {}."  from 
> [here|https://github.com/apache/beam/blob/7727dc99ed5dc1fb46166ef496ab3607ee2779f8/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/DirectStreamObserver.java#L100]
> Ref:
> [https://github.com/grpc/grpc-java/blob/master/stub/src/main/java/io/grpc/stub/StreamObserver.java#L62]
> [https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/windmill/GrpcWindmillServer.java#L939]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to