Thanks wudi for the updating, the FLIP generally looks good to me, I only left two minor suggestions:
(1) The suffix `.s` in configoption doris.request.query.timeout.s looks strange to me, could we change all time interval related option value type to Duration ? (2) Could you check and improve all config options like `doris.exec.mem.limit` to make them to follow flink config option naming and value type? Best, Leonard > > >> 2024年3月6日 06:12,Jing Ge <j...@ververica.com.INVALID> 写道: >> >> Hi Di, >> >> Thanks for your proposal. +1 for the contribution. I'd like to know your >> thoughts about the following questions: >> >> 1. According to your clarification of the exactly-once, thanks for it BTW, >> no PreCommitTopology is required. Does it make sense to let DorisSink[1] >> implement SupportsCommitter, since the TwoPhaseCommittingSink is >> deprecated[2] before turning the Doris connector into a Flink connector? >> 2. OLAP engines are commonly used as the tail/downstream of a data pipeline >> to support further e.g. ad-hoc query or cube with feasible pre-aggregation. >> Just out of curiosity, would you like to share some real use cases that >> will use OLAP engines as the source of a streaming data pipeline? Or it >> will only be used as the source for the batch? >> 3. The E2E test only covered sink[3], if I am not mistaken. Would you like >> to test the source in E2E too? >> >> [1] >> https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/main/java/org/apache/doris/flink/sink/DorisSink.java#L55 >> [2] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API >> [3] >> https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/MySQLDorisE2ECase.java#L96 >> >> Best regards, >> Jing >> >> On Tue, Mar 5, 2024 at 11:18 AM wudi <676366...@qq.com.invalid> wrote: >> >>> Hi, Jeyhun Karimov. >>> Thanks for your question. >>> >>> - How to ensure Exactly-Once? >>> 1. When the Checkpoint Barrier arrives, DorisSink will trigger the >>> precommit api of StreamLoad to complete the persistence of data in Doris >>> (the data will not be visible at this time), and will also pass this TxnID >>> to the Committer. >>> 2. When this Checkpoint of the entire Job is completed, the Committer will >>> call the commit api of StreamLoad and commit TxnID to complete the >>> visibility of the transaction. >>> 3. When the task is restarted, the Txn with successful precommit and >>> failed commit will be aborted based on the label-prefix, and Doris' abort >>> API will be called. (At the same time, Doris will also abort transactions >>> that have not been committed for a long time) >>> >>> ps: At the same time, this part of the content has been updated in FLIP >>> >>> - Because the default table model in Doris is Duplicate ( >>> https://doris.apache.org/docs/data-table/data-model/), which does not >>> have a primary key, batch writing may cause data duplication, but UNIQ The >>> model has a primary key, which ensures the idempotence of writing, thus >>> achieving Exactly-Once >>> >>> Brs, >>> di.wu >>> >>> >>>> 2024年3月2日 17:50,Jeyhun Karimov <je.kari...@gmail.com> 写道: >>>> >>>> Hi, >>>> >>>> Thanks for the proposal. +1 for the FLIP. >>>> I have a few questions: >>>> >>>> - How exactly the two (Stream Load's two-phase commit and Flink's >>> two-phase >>>> commit) combination will ensure the e2e exactly-once semantics? >>>> >>>> - The FLIP proposes to combine Doris's batch writing with the primary key >>>> table to achieve Exactly-Once semantics. Could you elaborate more on >>> that? >>>> Why it is not the default behavior but a workaround? >>>> >>>> Regards, >>>> Jeyhun >>>> >>>> On Sat, Mar 2, 2024 at 10:14 AM Yanquan Lv <decq12y...@gmail.com> wrote: >>>> >>>>> Thanks for driving this. >>>>> The content is very detailed, it is recommended to add a section on Test >>>>> Plan for more completeness. >>>>> >>>>> Di Wu <d...@apache.org> 于2024年1月25日周四 15:40写道: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Previously, we had some discussions about contributing Flink Doris >>>>>> Connector to the Flink community [1]. I want to further promote this >>>>> work. >>>>>> I hope everyone will help participate in this FLIP discussion and >>> provide >>>>>> more valuable opinions and suggestions. >>>>>> Thanks. >>>>>> >>>>>> [1] https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p >>>>>> >>>>>> Brs, >>>>>> di.wu >>>>>> >>>>>> >>>>>> >>>>>> On 2023/12/07 05:02:46 wudi wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> As discussed in the previous email [1], about contributing the Flink >>>>>> Doris Connector to the Flink community. >>>>>>> >>>>>>> >>>>>>> Apache Doris[2] is a high-performance, real-time analytical database >>>>>> based on MPP architecture, for scenarios where Flink is used for data >>>>>> analysis, processing, or real-time writing on Doris, Flink Doris >>>>> Connector >>>>>> is an effective tool. >>>>>>> >>>>>>> At the same time, Contributing Flink Doris Connector to the Flink >>>>>> community will further expand the Flink Connectors ecosystem. >>>>>>> >>>>>>> So I would like to start an official discussion FLIP-399: Flink >>>>>> Connector Doris[3]. >>>>>>> >>>>>>> Looking forward to comments, feedbacks and suggestions from the >>>>>> community on the proposal. >>>>>>> >>>>>>> [1] https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p >>>>>>> [2] >>>>> https://doris.apache.org/docs/dev/get-starting/what-is-apache-doris/ >>>>>>> [3] >>>>>> >>>>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris >>>>>>> >>>>>>> >>>>>>> Brs, >>>>>>> >>>>>>> di.wu >>>>>>> >>>>>> >>>>> >>> >>> >