[ 
https://issues.apache.org/jira/browse/FLINK-17122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-17122:
-----------------------------------
    Description: 
Currently when user defined some {{InputSelectable}} or {{BoundedMultiInput}} 
operators, checkpointing is not supported. Main problem is the that combination 
of {{InputSelectable}} and barrier alignment can lead to deadlocks (checkpoint 
barrier stuck on not selected channel). 

Problem could be somehow mitigated via unaligned checkpoints (FLINK-14551), but 
not fully. Even with unaligned checkpoints, checkpoint barriers can be stuck in 
the job graph if there is a {{flatMap}} operator (or non {{flatMap}} operator 
but if records are spanning multiple buffers), blocked in the middle of 
processing by some down stream input selection. In such case we are not able to 
perform unaligned checkpoint.

Potential solution could be using persistent communication channels or 
detecting before mentioned situations and avoid the dead lock by spilling 
excess data.

As most of the problems are with input selections that are randomly flipping, 
there might be some partial solution for more trivial cases, like reading one 
side of an input fully before the other.

  was:
Currently when user defined some {{InputSelectable}} or {{BoundedMultiInput}} 
operators, checkpointing is not supported. Main problem is the that combination 
of {{InputSelectable}} and barrier alignment can lead to deadlocks (checkpoint 
barrier stuck on not selected channel). 

Problem could be somehow mitigated via unaligned checkpoints (FLINK-14551), but 
not fully. Even with unaligned checkpoints, checkpoint barriers can be stuck in 
the job graph if there is a {{flatMap}} operator (or non {{flatMap}} operator 
but if records are spanning multiple buffers), blocked in the middle of 
processing by some down stream input selection. In such case we are not able to 
perform unaligned checkpoint.

Potential solution could be using persistent communication channels or 
detecting before mentioned situations and avoid the dead lock by spilling 
excess data.


> Support InputSelectable and BoundedMultiInput operators with checkpointing
> --------------------------------------------------------------------------
>
>                 Key: FLINK-17122
>                 URL: https://issues.apache.org/jira/browse/FLINK-17122
>             Project: Flink
>          Issue Type: Wish
>          Components: Runtime / Checkpointing, Runtime / Network
>    Affects Versions: 1.9.2, 1.10.0
>            Reporter: Piotr Nowojski
>            Priority: Major
>
> Currently when user defined some {{InputSelectable}} or {{BoundedMultiInput}} 
> operators, checkpointing is not supported. Main problem is the that 
> combination of {{InputSelectable}} and barrier alignment can lead to 
> deadlocks (checkpoint barrier stuck on not selected channel). 
> Problem could be somehow mitigated via unaligned checkpoints (FLINK-14551), 
> but not fully. Even with unaligned checkpoints, checkpoint barriers can be 
> stuck in the job graph if there is a {{flatMap}} operator (or non {{flatMap}} 
> operator but if records are spanning multiple buffers), blocked in the middle 
> of processing by some down stream input selection. In such case we are not 
> able to perform unaligned checkpoint.
> Potential solution could be using persistent communication channels or 
> detecting before mentioned situations and avoid the dead lock by spilling 
> excess data.
> As most of the problems are with input selections that are randomly flipping, 
> there might be some partial solution for more trivial cases, like reading one 
> side of an input fully before the other.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to