Dear devs, today I'd like to start the discussion on the Sink API. I have drafted a FLIP [1] with an accompanying PR [2].
This FLIP is a bit special as it's actually a few smaller Amend-FLIPs in one. In this discussion, we should decide on the scope and cut out too invasive steps if we can't reach an agreement. Step 1 is to add a few more pieces of information to context objects. That's non-breaking and needed for the async communication pattern in FLIP-171 [3]. While we need to add a new Public API (MailboxExecutor), I think that this should entail the least discussions. Step 2 is to also offer the same context information to committers. Here we can offer some compatibility methods to not break existing sinks. The main use case would be some async exactly-once sink but I'm not sure if we would use async communication patterns at all here (or simply wait for all async requests to finish in a sync way). It may also help with async cleanup tasks though. While drafting Step 2, I noticed the big entanglement of the current API. To figure out if there is a committer during the stream graph creation, we actually need to create a committer which can have unforeseen consequences. Thus, I spiked if we can disentangle the interface and have separate interfaces for the different use cases. The resulting step 3 would be a completely breaking change and thus is probably controversial. However, I'd also see the disentanglement as a way to prepare to make Sinks more expressive (write and commit coordinator) without completely overloading the main interface. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-177%3A+Extend+Sink+API [2] https://github.com/apache/flink/pull/16399 [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink