Hi Eron, Very interesting idea to support exactly once semantics for sinks via Git! I would be curious about the performance of such a sink.
Since this currently works on local file systems only (throws an Exception otherwise), I wonder how does it work on failures when the "git-${subtaskIndex}" directory is not available on a node? We might loose some of the exactly once semantics because the task deployment is not deterministic. Nevertheless, very elegant hack! Cheers, Max On Sat, Apr 23, 2016 at 12:23 AM, Eron Wright <ewri...@live.com> wrote: > Hello, > On a long plane trip I had some fun with writing a Flink streaming connector > based on Git. https://github.com/EronWright/flink-git > Not intended for real application use; flink-git is just an experiment meant > for discussion. > Flink's Kafka connector provides exactly-once guarantees when acting as a > source (consumer) but not as a sink (producer), due to a limitation of Kafka. > This limitation invites the question of how to extend Kafka (or a similar > system) to provide exactly-once guarantees for a sink. Since Kafka is > envisioned as a commit log, may an answer be found in commit log concepts? > The flink-git repository explores that possibility. > Git provides a useful conceptual framework for the investigation, since its > concepts are familiar and it is easily programmable with jgit. The flink-git > repository is thus an experimental connector, based on jgit, that explores > providing exactly-once guarantees as both a source and as a sink. > Enjoy,Eron Wright >