[ 
https://issues.apache.org/jira/browse/FLINK-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223946#comment-15223946
 ] 

Stephan Ewen commented on FLINK-3692:
-------------------------------------

Nice idea.

The main difference between Flink and Samza is that Flink may process a lot and 
then "bulk commit" changes as part of a checkpoint. If a failure occurs before 
that, it may be that parts of the changes will have to be discarded. The data 
read for recovery may hence be only parts of the Kafka log, not all of it. That 
means the implementation will require a bit of finesse, but it should be 
possible.

*Concerning savepoints*
I think we should start to strictly separate those from checkpoints. A 
savepoint means a full state snapshot is written into some stream, which 
checkpoints can work differently. So, this should not be a long term blocker.

*Related*
We are also looking into replicating checkpoints between Flink worker 
processes. That would also eliminate the HDFS dependency for checkpoints.

> Develop a Kafka state backend
> -----------------------------
>
>                 Key: FLINK-3692
>                 URL: https://issues.apache.org/jira/browse/FLINK-3692
>             Project: Flink
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Elias Levy
>
> Flink clusters usually consume of a Kafka cluster.  It simplify operations if 
> Flink could store its state checkpoints in Kafka.  This should be possibly 
> using different topics to write to, partitioning appropriately, and using 
> compacted topics.  This would avoid the need to run an HDFS cluster just to 
> store Flink checkpoints.
> For inspiration you may want to take a look at how Samza checkpoints a task's 
> local state to a Kafka topic, and how the newer Kafka consumers checkpoint 
> their offsets to Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to