suganya created BEAM-3494:
-----------------------------
Summary: Snapshot state of aggregated data of apache beam project
is not maintained in flink's checkpointing
Key: BEAM-3494
URL: https://issues.apache.org/jira/browse/BEAM-3494
Project: Beam
Issue Type: Bug
Components: sdk-java-core
Reporter: suganya
Assignee: Kenneth Knowles
We have a beam project which consumes events from kafka,does a groupby in a
time window(5 mins),after window elapses it pushes the events to downstream for
merge.This project is deployed using flink ,we have enabled checkpointing to
recover from failed state.
(windowsize: 5mins , checkpointingInterval: 5mins,state.backend: filesystem)
Offsets from kafka get checkpointed every 5 mins(checkpointingInterval).Before
finishing the entire DAG(groupBy and merge) , events offsets are getting
checkpointed.So incase of any restart from task-manager ,new task gets started
from last successful checkpoint ,but we could'nt able to get the aggregated
snapshot data(data from groupBy task) from the persisted checkpoint.
Able to retrieve the last successful checkpointed offset from kafka ,but
couldnt able to get last aggregated data till checkpointing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)