baifangkual opened a new issue, #6913:
URL: https://github.com/apache/seatunnel/issues/6913

   ### Search before asking
   
   - [X] I had searched in the 
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   The feasibility of seatunnel as an underlying support for an ETL 
tool(seatunnel作为ETL工具的底层支持的可行性)?
   
   > Due to my limited proficiency in English, the following content is 
translated from Chinese. The equivalent content in Chinese is provided in the 
footnote below. I apologize for any inconvenience caused.
   ### English Description:
   I have a requirement to develop an ETL tool that supports operations like 
intersection and union on multiple tables. After researching the source code of 
Seatunnel for some time, I found that the current data set in Seatunnel is 
represented by the SeatunnelRow type, which represents a row of data. However, 
due to the lack of table structure and overall table information during 
transform runtime, implementing transform plugins as per the documentation 
alone cannot fulfill the requirement for multi-table operations. Additionally, 
the Sql type of transform currently does not support clauses like join and 
group. In order to fulfill my requirements, I urgently need answers to the 
following questions:
   1. Does Seatunnel have plans to introduce a data set type based on Table in 
the future? (Given the current logical structure of Seatunnel, it seems 
unlikely to achieve complex operations in transform, especially those involving 
partial metadata operations on tables.)
   2. Is it difficult to perform secondary development on Seatunnel to 
implement transform operations on Table data sets without affecting or 
minimizing the impact on the current logical structure of Seatunnel? (Based on 
my observation of the source code, it may require implementing code for other 
data sets like FullTable, additional process control like SeaTunnelTask, 
FlowLifeCycle, etc.)
   3. I noticed that the source code follows a high-level abstract structure 
from job submission to execution, while the lower-level implementation 
currently revolves around SeatunnelRow type for process control and lifecycle. 
Are there any plans to introduce other implementations at the lower level in 
the future?
   
   Thank you for your response!
   ### Chinese Description:
   
我有需求做一款ETL工具,需支持多表的交集、并集等操作,经过一段时间对seatunnel的源码调研,发现目前seatunnel的数据集为SeatunnelRow类型,表示一行数据,因transform运行时缺少表结构和整表信息等,所以直接按照文档实现transform插件并不能完成多表的操作需求,并且Sql类型的transform目前也不支持join、group等子句。为了完成我的需求,我迫切的需要知道以下几个问题的答案:
   1. 
seatunnel未来有计划做以Table为类型的数据集吗(以seatunnel目前的逻辑结构,transform想要实现复杂的操作,尤其涉及到需要表的部分元数据的操作,似乎不太可能)?
   2. 
在seatunnel上做二次开发,实现对Table数据集的tranform操作,在不影响或尽量少影响seatunnel目前逻辑结构的情况下,实现困难吗(通过目前对源码的观察,可能需要实现代码如:其他的数据集FullTable、另外的流程控制SeaTunnelTask、FlowLifeCycle类型等)?
   3. 
我看到源码整体从job提交到执行的流程,发现上层结构很抽象,而下层目前为围绕SeatunnelRow类型的流程控制和生命周期等实现,是否后续下层有计划做其他实现?
   
   感谢你的回答!
   
   ### Usage Scenario
   
   ### Usage Scenario English
   I need to develop an ETL tool that requires support for operations like 
union of multiple tables during the data transformation process. I intend to 
use seatunnel as the underlying framework and perform secondary development on 
top of it.
   ### Usage Scenario Chinese
   我需要做一款ETL工具,中间的数据转换过程需支持多表并集等操作,想将seatunnel作为底层并在其上做二次开发
   
   ### Related issues
   
   no
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to