vvcephei commented on a change in pull request #10462:
URL: https://github.com/apache/kafka/pull/10462#discussion_r612589665



##########
File path: 
streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamImplJoin.java
##########
@@ -118,20 +132,40 @@
         final ProcessorGraphNode<K1, V2> otherWindowedStreamsNode = new 
ProcessorGraphNode<>(otherWindowStreamProcessorName, 
otherWindowStreamProcessorParams);
         builder.addGraphNode(otherGraphNode, otherWindowedStreamsNode);
 
+        Optional<StoreBuilder<WindowStore<KeyAndJoinSide<K1>, 
ValueOrOtherValue<V1, V2>>>> outerJoinWindowStore = Optional.empty();
+        if (leftOuter || rightOuter) {
+            final String outerJoinSuffix = "-shared-outer-join-store";
+            final String outerJoinStoreGeneratedName = 
builder.newProcessorName(KStreamImpl.OUTERSHARED_NAME);
+            final String outerJoinStoreName = userProvidedBaseStoreName == 
null ? outerJoinStoreGeneratedName : userProvidedBaseStoreName + 
outerJoinSuffix;
+
+            outerJoinWindowStore = 
Optional.of(outerJoinWindowStoreBuilder(outerJoinStoreName, windows, 
streamJoinedInternal));
+        }
+
+        // Time shared between joins to keep track of the maximum stream time

Review comment:
       It's extremely subtle, but we cannot use `context.streamTime()` because 
of the time-delay effects of upstream record caches. This was the cause of a 
severe bug in `suppress` that went undetected until after it was released.
   
   For example: if we have a record cache upstream of this join, it will delay 
the propogation of records (and their accompanying timestamps) by time amount 
`D`. Say we ingest some record with timestamp `T`. If we reference the 
context's stream time, our processor will think it is at time `T`, when it is 
really at time `T - D`, leading it to behave wrongly, such as enforcing the 
grace period prematurely, which will manifest to users as data loss.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to