Hi Di Thank you for initiating this FLIP, +1 for this.
Regarding the option `doris.filter.query` of doris source table Can we directly implement the FilterPushDown capability of Flink Source like Jdbc Source [1] instead of introducing an option? Regarding two-phase commit, > At the same time, Doris will also abort transactions that have not been committed for a long time Can we control the transaction timeout in the connector? And control the behavior when timeout occurs, whether to discard by default or trigger job failure? [1]. https://issues.apache.org/jira/browse/FLINK-16024 Best, Feng On Tue, Mar 12, 2024 at 12:12 AM Ferenc Csaky <ferenc.cs...@pm.me.invalid> wrote: > Hi, > > Thanks for driving this, +1 for the FLIP. > > Best, > Ferenc > > > > > On Monday, March 11th, 2024 at 15:17, Ahmed Hamdy <hamdy10...@gmail.com> > wrote: > > > > > > > Hello, > > Thanks for the proposal, +1 for the FLIP. > > > > Best Regards > > Ahmed Hamdy > > > > > > On Mon, 11 Mar 2024 at 15:12, wudi 676366...@qq.com.invalid wrote: > > > > > Hi, Leonard > > > Thank you for your suggestion. > > > I referred to other Connectors[1], modified the naming and types of > > > relevant parameters[2], and also updated FLIP. > > > > > > [1] > > > > https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/overview/ > > > [1] > > > > https://github.com/apache/doris-flink-connector/blob/master/flink-doris-connector/src/main/java/org/apache/doris/flink/table/DorisConfigOptions.java > > > > > > Brs, > > > di.wu > > > > > > > 2024年3月7日 14:33,Leonard Xu xbjt...@gmail.com 写道: > > > > > > > > Thanks wudi for the updating, the FLIP generally looks good to me, I > > > > only left two minor suggestions: > > > > > > > > (1) The suffix `.s` in configoption doris.request.query.timeout.s > looks > > > > strange to me, could we change all time interval related option > value type > > > > to Duration ? > > > > > > > > (2) Could you check and improve all config options like > > > > `doris.exec.mem.limit` to make them to follow flink config option > naming > > > > and value type? > > > > > > > > Best, > > > > Leonard > > > > > > > > > > 2024年3月6日 06:12,Jing Ge j...@ververica.com.INVALID 写道: > > > > > > > > > > > > Hi Di, > > > > > > > > > > > > Thanks for your proposal. +1 for the contribution. I'd like to > know > > > > > > your > > > > > > thoughts about the following questions: > > > > > > > > > > > > 1. According to your clarification of the exactly-once, thanks > for it > > > > > > BTW, > > > > > > no PreCommitTopology is required. Does it make sense to let > > > > > > DorisSink[1] > > > > > > implement SupportsCommitter, since the TwoPhaseCommittingSink is > > > > > > deprecated[2] before turning the Doris connector into a Flink > > > > > > connector? > > > > > > 2. OLAP engines are commonly used as the tail/downstream of a > data > > > > > > pipeline > > > > > > to support further e.g. ad-hoc query or cube with feasible > > > > > > pre-aggregation. > > > > > > Just out of curiosity, would you like to share some real use > cases that > > > > > > will use OLAP engines as the source of a streaming data > pipeline? Or it > > > > > > will only be used as the source for the batch? > > > > > > 3. The E2E test only covered sink[3], if I am not mistaken. > Would you > > > > > > like > > > > > > to test the source in E2E too? > > > > > > > > > > > > [1] > > > > > > > https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/main/java/org/apache/doris/flink/sink/DorisSink.java#L55 > > > > > > > > > [2] > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API > > > > > > > > > [3] > > > > > > > https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/MySQLDorisE2ECase.java#L96 > > > > > > > > > Best regards, > > > > > > Jing > > > > > > > > > > > > On Tue, Mar 5, 2024 at 11:18 AM wudi 676366...@qq.com.invalid > wrote: > > > > > > > > > > > > > Hi, Jeyhun Karimov. > > > > > > > Thanks for your question. > > > > > > > > > > > > > > - How to ensure Exactly-Once? > > > > > > > 1. When the Checkpoint Barrier arrives, DorisSink will trigger > the > > > > > > > precommit api of StreamLoad to complete the persistence of > data in > > > > > > > Doris > > > > > > > (the data will not be visible at this time), and will also > pass this > > > > > > > TxnID > > > > > > > to the Committer. > > > > > > > 2. When this Checkpoint of the entire Job is completed, the > Committer > > > > > > > will > > > > > > > call the commit api of StreamLoad and commit TxnID to complete > the > > > > > > > visibility of the transaction. > > > > > > > 3. When the task is restarted, the Txn with successful > precommit and > > > > > > > failed commit will be aborted based on the label-prefix, and > Doris' > > > > > > > abort > > > > > > > API will be called. (At the same time, Doris will also abort > > > > > > > transactions > > > > > > > that have not been committed for a long time) > > > > > > > > > > > > > > ps: At the same time, this part of the content has been > updated in > > > > > > > FLIP > > > > > > > > > > > > > > - Because the default table model in Doris is Duplicate ( > > > > > > > https://doris.apache.org/docs/data-table/data-model/), which > does not > > > > > > > have a primary key, batch writing may cause data duplication, > but > > > > > > > UNIQ The > > > > > > > model has a primary key, which ensures the idempotence of > writing, > > > > > > > thus > > > > > > > achieving Exactly-Once > > > > > > > > > > > > > > Brs, > > > > > > > di.wu > > > > > > > > > > > > > > > 2024年3月2日 17:50,Jeyhun Karimov je.kari...@gmail.com 写道: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > Thanks for the proposal. +1 for the FLIP. > > > > > > > > I have a few questions: > > > > > > > > > > > > > > > > - How exactly the two (Stream Load's two-phase commit and > Flink's > > > > > > > > two-phase > > > > > > > > commit) combination will ensure the e2e exactly-once > semantics? > > > > > > > > > > > > > > > > - The FLIP proposes to combine Doris's batch writing with the > > > > > > > > primary key > > > > > > > > table to achieve Exactly-Once semantics. Could you elaborate > more on > > > > > > > > that? > > > > > > > > Why it is not the default behavior but a workaround? > > > > > > > > > > > > > > > > Regards, > > > > > > > > Jeyhun > > > > > > > > > > > > > > > > On Sat, Mar 2, 2024 at 10:14 AM Yanquan Lv > decq12y...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks for driving this. > > > > > > > > > The content is very detailed, it is recommended to add a > section on > > > > > > > > > Test > > > > > > > > > Plan for more completeness. > > > > > > > > > > > > > > > > > > Di Wu d...@apache.org 于2024年1月25日周四 15:40写道: > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > Previously, we had some discussions about contributing > Flink Doris > > > > > > > > > > Connector to the Flink community [1]. I want to further > promote > > > > > > > > > > this > > > > > > > > > > work. > > > > > > > > > > I hope everyone will help participate in this FLIP > discussion and > > > > > > > > > > provide > > > > > > > > > > more valuable opinions and suggestions. > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p > > > > > > > > > > > > > > > > > > > > Brs, > > > > > > > > > > di.wu > > > > > > > > > > > > > > > > > > > > On 2023/12/07 05:02:46 wudi wrote: > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > As discussed in the previous email [1], about > contributing the > > > > > > > > > > > Flink > > > > > > > > > > > Doris Connector to the Flink community. > > > > > > > > > > > > > > > > > > > > > > Apache Doris[2] is a high-performance, real-time > analytical > > > > > > > > > > > database > > > > > > > > > > > based on MPP architecture, for scenarios where Flink > is used for > > > > > > > > > > > data > > > > > > > > > > > analysis, processing, or real-time writing on Doris, > Flink Doris > > > > > > > > > > > Connector > > > > > > > > > > > is an effective tool. > > > > > > > > > > > > > > > > > > > > > > At the same time, Contributing Flink Doris Connector > to the Flink > > > > > > > > > > > community will further expand the Flink Connectors > ecosystem. > > > > > > > > > > > > > > > > > > > > > > So I would like to start an official discussion > FLIP-399: Flink > > > > > > > > > > > Connector Doris[3]. > > > > > > > > > > > > > > > > > > > > > > Looking forward to comments, feedbacks and suggestions > from the > > > > > > > > > > > community on the proposal. > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p > > > > > > > > > > > [2] > > > > > > https://doris.apache.org/docs/dev/get-starting/what-is-apache-doris/ > > > > > > > > > > > > > > [3] > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris > > > > > > > > > > > > > > Brs, > > > > > > > > > > > > > > > > > > > > > > di.wu >