Hi Di,

Thanks for your proposal. +1 for the contribution. I'd like to know your
thoughts about the following questions:

1. According to your clarification of the exactly-once, thanks for it BTW,
no PreCommitTopology is required. Does it make sense to let DorisSink[1]
implement SupportsCommitter, since the TwoPhaseCommittingSink is
deprecated[2] before turning the Doris connector into a Flink connector?
2. OLAP engines are commonly used as the tail/downstream of a data pipeline
to support further e.g. ad-hoc query or cube with feasible pre-aggregation.
Just out of curiosity, would you like to share some real use cases that
will use OLAP engines as the source of a streaming data pipeline? Or it
will only be used as the source for the batch?
3. The E2E test only covered sink[3], if I am not mistaken. Would you like
to test the source in E2E too?

[1]
https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/main/java/org/apache/doris/flink/sink/DorisSink.java#L55
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API
[3]
https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/MySQLDorisE2ECase.java#L96

Best regards,
Jing

On Tue, Mar 5, 2024 at 11:18 AM wudi <676366...@qq.com.invalid> wrote:

> Hi, Jeyhun Karimov.
> Thanks for your question.
>
> - How to ensure Exactly-Once?
> 1. When the Checkpoint Barrier arrives, DorisSink will trigger the
> precommit api of StreamLoad to complete the persistence of data in Doris
> (the data will not be visible at this time), and will also pass this TxnID
> to the Committer.
> 2. When this Checkpoint of the entire Job is completed, the Committer will
> call the commit api of StreamLoad and commit TxnID to complete the
> visibility of the transaction.
> 3. When the task is restarted, the Txn with successful precommit and
> failed commit will be aborted based on the label-prefix, and Doris' abort
> API will be called. (At the same time, Doris will also abort transactions
> that have not been committed for a long time)
>
> ps: At the same time, this part of the content has been updated in FLIP
>
> - Because the default table model in Doris is Duplicate (
> https://doris.apache.org/docs/data-table/data-model/), which does not
> have a primary key, batch writing may cause data duplication, but UNIQ The
> model has a primary key, which ensures the idempotence of writing, thus
> achieving Exactly-Once
>
> Brs,
> di.wu
>
>
> > 2024年3月2日 17:50,Jeyhun Karimov <je.kari...@gmail.com> 写道:
> >
> > Hi,
> >
> > Thanks for the proposal. +1 for the FLIP.
> > I have a few questions:
> >
> > - How exactly the two (Stream Load's two-phase commit and Flink's
> two-phase
> > commit) combination will ensure the e2e exactly-once semantics?
> >
> > - The FLIP proposes to combine Doris's batch writing with the primary key
> > table to achieve Exactly-Once semantics. Could you elaborate more on
> that?
> > Why it is not the default behavior but a workaround?
> >
> > Regards,
> > Jeyhun
> >
> > On Sat, Mar 2, 2024 at 10:14 AM Yanquan Lv <decq12y...@gmail.com> wrote:
> >
> >> Thanks for driving this.
> >> The content is very detailed, it is recommended to add a section on Test
> >> Plan for more completeness.
> >>
> >> Di Wu <d...@apache.org> 于2024年1月25日周四 15:40写道:
> >>
> >>> Hi all,
> >>>
> >>> Previously, we had some discussions about contributing Flink Doris
> >>> Connector to the Flink community [1]. I want to further promote this
> >> work.
> >>> I hope everyone will help participate in this FLIP discussion and
> provide
> >>> more valuable opinions and suggestions.
> >>> Thanks.
> >>>
> >>> [1] https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p
> >>>
> >>> Brs,
> >>> di.wu
> >>>
> >>>
> >>>
> >>> On 2023/12/07 05:02:46 wudi wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> As discussed in the previous email [1], about contributing the Flink
> >>> Doris Connector to the Flink community.
> >>>>
> >>>>
> >>>> Apache Doris[2] is a high-performance, real-time analytical database
> >>> based on MPP architecture, for scenarios where Flink is used for data
> >>> analysis, processing, or real-time writing on Doris, Flink Doris
> >> Connector
> >>> is an effective tool.
> >>>>
> >>>> At the same time, Contributing Flink Doris Connector to the Flink
> >>> community will further expand the Flink Connectors ecosystem.
> >>>>
> >>>> So I would like to start an official discussion FLIP-399: Flink
> >>> Connector Doris[3].
> >>>>
> >>>> Looking forward to comments, feedbacks and suggestions from the
> >>> community on the proposal.
> >>>>
> >>>> [1] https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p
> >>>> [2]
> >> https://doris.apache.org/docs/dev/get-starting/what-is-apache-doris/
> >>>> [3]
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris
> >>>>
> >>>>
> >>>> Brs,
> >>>>
> >>>> di.wu
> >>>>
> >>>
> >>
>
>

Reply via email to