[
https://issues.apache.org/jira/browse/FLINK-19382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206056#comment-17206056
]
Arvid Heise commented on FLINK-19382:
-------------------------------------
Hi Theo,
this sounds like an interesting idea. I can see quite a few application profit
from such a state backend.
There are some related workstreams and I was thinking that we could align them
with your idea:
* Persistent channels: Network data is stored in some storage system (could be
Kafka, Bookkeeper, or some custom implementation). Failover can pretty much
immediately restart from the previous state of the graph without needing to
replay from the last checkpoint.
* Durable short term log: stores changes to the state backend on some durable
medium such that on failure, any task manager could take over.
[~sewen] do you think that we could implement ReplayableSource with one of
these building blocks?
> Introducing ReplayableSourceStateBackend
> ----------------------------------------
>
> Key: FLINK-19382
> URL: https://issues.apache.org/jira/browse/FLINK-19382
> Project: Flink
> Issue Type: Improvement
> Reporter: Theo Diefenthal
> Priority: Major
>
> I got the idea of a new StateBackend simply called "ReplayableSource". This
> statebackend would be bound by a number of limitations, but in the few areas,
> it could improve the pipeline performance by magnitudes which makes me think
> it's worth debating about it.
> I'd like to start with describing two useful scenarios for such a state
> backend before debating more about the backend. Both scenarios share that
> they read data from kafka and process that via Flink.
> Scenario 1: Buffering data. Currently I'm developing a pipeline where
> directly post to reading the data, I need to buffer it for 1 minute in event
> time. This is due to inter-event-dependencies: If within one minute a certain
> event arrives, I have to enrich information to the first event. Right now, I
> store the 1 minute event time data in Flink State leading to checkpointing
> the entire buffer on each checkpoint and making it impossible for me to use
> exactly-once processing (We want to have as low latency as possible, i.e.
> after buffering at maximum a few seconds while simoulatenously having high
> volume of data).
> Scenario 2: Performing Flink SQL CEP Queries. CEP Queries naturally have
> inter-event-dependencies. Having simple MATCH_RECOGNIZE queries directly post
> to a kafka source often lead to requiring RocksDB state backend and slow
> performance.
>
> The idea: Instead of storing the entire state, we could simply store a kafka
> offset. When restoring the state from savepoint, all the state could be
> restored by reading from kafka. The checkpoint size would thus be reduced
> from huge sizes down to just a few numbers which allows frequent and fast
> checkpointing.
>
> Limitations:
> * This would only work for fully deterministic/replayable streaming jobs. If
> a certain operator within the pipeline is not determinstic, a replay could
> cause another result.
> * The source must be replayable, e.g. kafka
> * This would also only work for "short-state-living" pipelines. There are
> many pipelines which build up their state over days, month or even years.
> Restoring such a state by replaying all the data over that time would be
> almost impossible, especially as kafka usually has a retention of something
> like a week configured. However, there are also many queries with
> short-lived-state like the mentioned CEP usecase where one usually have
> patterns defined in timeframes of second, minutes, hours or a few days for
> the event correlation.
> * Not sure if there are more limitations with regards to
> windowing/watermarks or similar things which would make that feature
> impossible!?
> For certain scenarios, this feature would obviously be dumb, most likely for
> windows pipelines. It is certainly much cheapter to e.g. store a COUNT per
> window then replaying all events per window in order to restore that COUNT.
> But I'm focusing on something like CEP.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)