[
https://issues.apache.org/jira/browse/SAMZA-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372265#comment-17372265
]
Cameron Lee commented on SAMZA-2303:
------------------------------------
See [https://github.com/apache/samza/pull/1506] for some more
context/discussion about this.
> Exclude side inputs when handling end-of-stream and watermarks for high-level
> -----------------------------------------------------------------------------
>
> Key: SAMZA-2303
> URL: https://issues.apache.org/jira/browse/SAMZA-2303
> Project: Samza
> Issue Type: Bug
> Reporter: Cameron Lee
> Priority: Major
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> OperatorImplGraph builds EndOfStreamStates and WatermarkStates objects with
> all of the input SSPs from the job model. That includes side-input SSPs.
> However, high-level operator tasks aren't given messages from side-input
> SSPs, so high-level operators should not need to include handling for
> end-of-stream and watermarks.
> The result of this issue is that end-of-stream and watermark handling tries
> to include side-inputs but never updates those states, which can result in
> not exiting properly (end-of-stream) and not correctly calculating watermarks.
> We currently have tests which use partitionBy and side-inputs, but they only
> use a single partition, so RunLoop is able to shutdown the task (RunLoop
> doesn't check side inputs when determining if the task is at the end of all
> streams). Normally, OperatorImpl will shut down the task when using
> high-level, and I think changing OperatorImpl to do ignore side input SSPs so
> that it does shut down the task is the fix.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)