Re: [DISCUSS] Decoupling connectors from compute engines

JUN GAO Sun, 08 May 2022 19:29:49 -0700

I agree with Zongwen Li. The goal of ST is data synchronization, not ETL
computing architecture. ST will focus on El and only transform at UDF
level. If st is a project dedicated to data synchronization, why do we have
to rely on Flink and spark? In order to focus on a data synchronization
project, St should focus more on how to make it easier for developers to
develop connectors and how to make it easier for users to use st to
complete data synchronization.Then now the community needs to spend more
energy on how to adapt different versions of Flink and spark. This is not
the state of a project with clear goals.


I personally think that in order to achieve the goal of St, a unified API
layer decoupled from the engine is only the first step. After completing
the API unification, we still have a lot to do:
1. A front-end UI built based on the unified API. The front-end UI should
be as easy to use as airbyte. Using it, users will be able to better focus
on how to define their own data synchronization tasks, rather than the
underlying engine of St.
2. Based on the unified API, we need to develop a set of monitoring system
and alarm system that is easy to use. This will help users more easily
understand the status of their data synchronization tasks, monitoring
indicators, etc.
3. Solve the shortcomings faced by Flink and spark as data synchronization
engines:
      a. The real-time synchronization task based on Flink cannot share
resources between jobs. When users have a large number of small tables, it
is a waste of resources.
      b. For Flink based real-time tasks, any subtask failure will cause
all subtasks of the entire job to stop processing data. In some cases, the
binlog of the database will expire and be           deleted soon. If the
whole synchronization job cannot run normally due to sink exception, the
binlog cannot be read after being deleted.

jianju1024 <[email protected]> 于2022年5月8日周日 15:20写道：

>
>
> 大家好，
>
>
> 我也有同感：像Spark？Flink？Beam？四不像吧！！
> 1. 为什么要拆解ETL？到底什么是E T L？本身就解不开
> 2. 基于多套引擎的话题，我在几个微信群里，不少人讨论很久，也没有定论
> 3. 如果是解决ETL难题，为什么要纠结于抽象在两套不同的引擎上呢？
> 4. 相比DataX，FlinkX，差太多了吧
>
>
>
>
> > 2022年5月7日 17:36，Zongwen Li <[email protected]> 写道：
> >
> > The goal of  Apache SeaTunnel is different from Apache Beam.
> > Apache SeaTunnel focuses on source and sink connectors, and develops
> > features in the field of data integration;
> > Apache Beam focuses and unifies all the functions of the compute engine,
> > including operators such as join, connect, map, etc. and it doesn't unify
> > streaming and batch source.
> >
> > This improvement proposal is to solve the current problems encountered by
> > SeaTunnel . If you have better ideas, you can bring them up for
> discussion.
> >
> > Best,
> > Zongwen Li
> >
> > leo65535 <[email protected]> 于2022年4月29日周五 16:14写道：
> >
> >>
> >>
> >> Hi @zongwen,
> >>
> >>
> >> I think this is not a good idea, it seems that we will be more and more
> >> like Apache Beam,
> >>
> >>
> >> Best,
> >> Leo65535
> >>
> >>
> >> At 2022-04-18 15:10:08, "李宗文" <[email protected]> wrote:
> >>> Hi All,
> >>> In the current implementation of SeaTunnel, the connector is coupled
> with
> >>> the computing engine, which results in a connector that needs to be
> >>> implemented for each engine, and it is difficult to support multiple
> >>> versions of the engine.
> >>>
> >>> Through the questionnaire, it was found that users used multiple
> versions
> >>> of Spark and Flink engines, and they also hoped that SeaTunnel would
> >>> support Change Data Capture (CDC) connectors;
> >>>
> >>> Based on the above questions and needs, I created an improvement
> proposal:
> >>> https://github.com/apache/incubator-seatunnel/issues/1608
> >>> Preliminary idea of Source and Sink API:
> >>> https://github.com/apache/incubator-seatunnel/issues/1701
> >>> https://github.com/apache/incubator-seatunnel/issues/1704
> >>>
> >>> Please discuss away! Zongwen Li
> >>
> >



-- 

Best Regards

------------

Apache DolphinScheduler PMC

Jun Gao
[email protected]

Re: [DISCUSS] Decoupling connectors from compute engines

Reply via email to