Hi Eron,

Very interesting idea to support exactly once semantics for sinks via
Git! I would be curious about the performance of such a sink.

Since this currently works on local file systems only (throws an
Exception otherwise), I wonder how does it work on failures when the
"git-${subtaskIndex}" directory is not available on a node? We might
loose some of the exactly once semantics because the task deployment
is not deterministic.

Nevertheless, very elegant hack!

Cheers,
Max

On Sat, Apr 23, 2016 at 12:23 AM, Eron Wright <ewri...@live.com> wrote:
> Hello,
> On a long plane trip I had some fun with writing a Flink streaming connector 
> based on Git.   https://github.com/EronWright/flink-git
> Not intended for real application use; flink-git is just an experiment meant 
> for discussion.
> Flink's Kafka connector provides exactly-once guarantees when acting as a 
> source (consumer) but not as a sink (producer), due to a limitation of Kafka. 
>  This limitation invites the question of how to extend Kafka (or a similar 
> system) to provide exactly-once guarantees for a sink. Since Kafka is 
> envisioned as a commit log, may an answer be found in commit log concepts? 
> The flink-git repository explores that possibility.
> Git provides a useful conceptual framework for the investigation, since its 
> concepts are familiar and it is easily programmable with jgit. The flink-git 
> repository is thus an experimental connector, based on jgit, that explores 
> providing exactly-once guarantees as both a source and as a sink.
> Enjoy,Eron Wright
>

Reply via email to