lukecwik commented on a change in pull request #16769:
URL: https://github.com/apache/beam/pull/16769#discussion_r801123126
##########
File path:
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataOutboundAggregator.java
##########
@@ -199,7 +210,14 @@ public void sendBufferedDataAndFinishOutboundStreams() {
.setIsLast(true);
entry.getValue().resetStats();
}
+ if (collectElementsIfNoFlushes && !hasFlushedForBundle) {
+ return bufferedElements.build();
+ }
outboundObserver.onNext(bufferedElements.build());
+ // This is now at the end of a bundle, so we reset hasFlushedForBundle to
prepare for new
+ // bundles.
+ hasFlushedForBundle = false;
Review comment:
`hasFlushedForBundle` is outside of the `flushLock` and is being
read/written by different threads.
##########
File path:
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/control/ProcessBundleHandler.java
##########
@@ -986,7 +1002,13 @@ void finish() {
inboundObserver2 =
BeamFnDataInboundObserver2.forConsumers(getInboundDataEndpoints(),
getTimerEndpoints());
for (BeamFnDataOutboundAggregator aggregator :
getOutboundAggregators().values()) {
- aggregator.startFlushThread();
+ aggregator.start(
+ getRunnerCapabilities()
+ .contains(
+ BeamUrns.getUrn(
+
StandardRunnerProtocols.Enum.CONTROL_RESPONSE_ELEMENTS_EMBEDDING))
+ // TODO: Handle multiple outbound ApiServiceDescriptors.
Review comment:
```suggestion
// TODO(BEAM-13193): Handle multiple outbound
ApiServiceDescriptors.
```
##########
File path:
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataOutboundAggregator.java
##########
@@ -85,10 +87,17 @@ public BeamFnDataOutboundAggregator(
this.processBundleRequestIdSupplier = processBundleRequestIdSupplier;
this.bytesWrittenSinceFlush = 0L;
this.flushLock = new Object();
+ this.hasFlushedForBundle = false;
}
- /** Starts the flushing daemon thread if data_buffer_time_limit_ms is set. */
- public void startFlushThread() {
+ /**
+ * Starts the flushing daemon thread if data_buffer_time_limit_ms is set. If
+ * collectElementsIfNoFlushes is true, {@link
+ * #sendOrCollectBufferedDataAndFinishOutboundStreams()} returns the
buffered data instead of
+ * sending it to the outboundObserver if no flushes have taken place within
this bundle
+ */
+ public void start(boolean collectElementsIfNoFlushes) {
+ this.collectElementsIfNoFlushes = collectElementsIfNoFlushes;
Review comment:
We should know that at construction time since we should know the runner
capability before we even start processing a bundle allowing for
`collectElementsIfNoFlushes` to be final but this would be difficult with out
solving the multiple aggregator case which doesn't seem challenging to capture
the results for each aggregator and either embed them in the response if you
captured them all or replay them back to the aggregator if one of them returned
null.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]