lukecwik commented on a change in pull request #16769:
URL: https://github.com/apache/beam/pull/16769#discussion_r801123126



##########
File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataOutboundAggregator.java
##########
@@ -199,7 +210,14 @@ public void sendBufferedDataAndFinishOutboundStreams() {
           .setIsLast(true);
       entry.getValue().resetStats();
     }
+    if (collectElementsIfNoFlushes && !hasFlushedForBundle) {
+      return bufferedElements.build();
+    }
     outboundObserver.onNext(bufferedElements.build());
+    // This is now at the end of a bundle, so we reset hasFlushedForBundle to 
prepare for new
+    // bundles.
+    hasFlushedForBundle = false;

Review comment:
       `hasFlushedForBundle` is outside of the `flushLock` and is being 
read/written by different threads.

##########
File path: 
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java
##########
@@ -986,7 +1002,13 @@ void finish() {
       inboundObserver2 =
           BeamFnDataInboundObserver2.forConsumers(getInboundDataEndpoints(), 
getTimerEndpoints());
       for (BeamFnDataOutboundAggregator aggregator : 
getOutboundAggregators().values()) {
-        aggregator.startFlushThread();
+        aggregator.start(
+            getRunnerCapabilities()
+                    .contains(
+                        BeamUrns.getUrn(
+                            
StandardRunnerProtocols.Enum.CONTROL_RESPONSE_ELEMENTS_EMBEDDING))
+                // TODO: Handle multiple outbound ApiServiceDescriptors.

Review comment:
       ```suggestion
                   // TODO(BEAM-13193): Handle multiple outbound 
ApiServiceDescriptors.
   ```

##########
File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataOutboundAggregator.java
##########
@@ -85,10 +87,17 @@ public BeamFnDataOutboundAggregator(
     this.processBundleRequestIdSupplier = processBundleRequestIdSupplier;
     this.bytesWrittenSinceFlush = 0L;
     this.flushLock = new Object();
+    this.hasFlushedForBundle = false;
   }
 
-  /** Starts the flushing daemon thread if data_buffer_time_limit_ms is set. */
-  public void startFlushThread() {
+  /**
+   * Starts the flushing daemon thread if data_buffer_time_limit_ms is set. If
+   * collectElementsIfNoFlushes is true, {@link
+   * #sendOrCollectBufferedDataAndFinishOutboundStreams()} returns the 
buffered data instead of
+   * sending it to the outboundObserver if no flushes have taken place within 
this bundle
+   */
+  public void start(boolean collectElementsIfNoFlushes) {
+    this.collectElementsIfNoFlushes = collectElementsIfNoFlushes;

Review comment:
       We should know that at construction time since we should know the runner 
capability before we even start processing a bundle allowing for 
`collectElementsIfNoFlushes` to be final but this would be difficult with out 
solving the multiple aggregator case which doesn't seem challenging to capture 
the results for each aggregator and either embed them in the response if you 
captured them all or replay them back to the aggregator if one of them returned 
null.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to