Hi all, A couple of first comments on this: 1. I'm missing the problem statement in the overall introduction. It immediately goes into proposal mode, I would like to first read what is the actual problem, before diving into solutions. 2. "Each ETL job creates snapshots with checkpoint info on sink tables in Table Store" -> That reads like you're proposing that snapshots need to be written to Table Store? 3. If you introduce a MetaService, it becomes the single point of failure because it coordinates everything. But I can't find anything in the FLIP on making the MetaService high available or how to deal with failovers there. 4. The FLIP states under Rejected Alternatives "Currently watermark in Flink cannot align data." which is not true, given that there is FLIP-182 https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
5. Given the MetaService role, it feels like this is introducing a tight dependency between Flink and the Table Store. How pluggable is this solution, given the changes that need to be made to Flink in order to support this? Best regards, Martijn On Thu, Dec 1, 2022 at 4:49 AM Shammon FY <zjur...@gmail.com> wrote: > Hi devs: > > I'd like to start a discussion about FLIP-276: Data Consistency of > Streaming and Batch ETL in Flink and Table Store[1]. In the whole data > stream processing, there are consistency problems such as how to manage the > dependencies of multiple jobs and tables, how to define and handle E2E > delays, and how to ensure the data consistency of queries on flowing data? > This FLIP aims to support data consistency and answer these questions. > > I'v discussed the details of this FLIP with @Jingsong Lee and @libenchao > offline several times. We hope to support data consistency of queries on > tables, managing relationships between Flink jobs and tables and revising > tables on streaming in Flink and Table Store to improve the whole data > stream processing. > > Looking forward to your feedback. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-276%3A+Data+Consistency+of+Streaming+and+Batch+ETL+in+Flink+and+Table+Store > > > Best, > Shammon >