[ 
https://issues.apache.org/jira/browse/SAMZA-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cameron Lee updated SAMZA-2303:
-------------------------------
    Description: 
OperatorImplGraph builds EndOfStreamStates and WatermarkStates objects with all 
of the input SSPs from the job model. That includes side-input SSPs. However, 
high-level operator tasks aren't given messages from side-input SSPs, so 
high-level operators should not need to include handling for end-of-stream and 
watermarks.

The result of this issue is that end-of-stream and watermark handling tries to 
include side-inputs but never updates those states, which can result in not 
exiting properly (end-of-stream) and not correctly calculating watermarks.

We currently have tests which use partitionBy and side-inputs, but they only 
use a single partition, so RunLoop is able to shutdown the task (RunLoop 
doesn't check side inputs when determining if the task is at the end of all 
streams). Normally, OperatorImpl will shut down the task when using high-level, 
and I think changing OperatorImpl to do ignore side input SSPs so that it does 
shut down the task is the fix.

  was:
OperatorImplGraph builds EndOfStreamStates and WatermarkStates objects with all 
of the input SSPs from the job model. That includes side-input SSPs. However, 
high-level operator tasks aren't given messages from side-input SSPs, so 
high-level operators should not need to include handling for end-of-stream and 
watermarks.

The result of this issue is that end-of-stream and watermark handling tries to 
include side-inputs but never updates those states, which can result in not 
exiting properly (end-of-stream) and not correctly calculating watermarks.


> Exclude side inputs when handling end-of-stream and watermarks for high-level
> -----------------------------------------------------------------------------
>
>                 Key: SAMZA-2303
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2303
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Cameron Lee
>            Priority: Major
>
> OperatorImplGraph builds EndOfStreamStates and WatermarkStates objects with 
> all of the input SSPs from the job model. That includes side-input SSPs. 
> However, high-level operator tasks aren't given messages from side-input 
> SSPs, so high-level operators should not need to include handling for 
> end-of-stream and watermarks.
> The result of this issue is that end-of-stream and watermark handling tries 
> to include side-inputs but never updates those states, which can result in 
> not exiting properly (end-of-stream) and not correctly calculating watermarks.
> We currently have tests which use partitionBy and side-inputs, but they only 
> use a single partition, so RunLoop is able to shutdown the task (RunLoop 
> doesn't check side inputs when determining if the task is at the end of all 
> streams). Normally, OperatorImpl will shut down the task when using 
> high-level, and I think changing OperatorImpl to do ignore side input SSPs so 
> that it does shut down the task is the fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to