baifangkual opened a new issue, #6913: URL: https://github.com/apache/seatunnel/issues/6913
### Search before asking - [X] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description The feasibility of seatunnel as an underlying support for an ETL tool(seatunnel作为ETL工具的底层支持的可行性)? > Due to my limited proficiency in English, the following content is translated from Chinese. The equivalent content in Chinese is provided in the footnote below. I apologize for any inconvenience caused. ### English Description: I have a requirement to develop an ETL tool that supports operations like intersection and union on multiple tables. After researching the source code of Seatunnel for some time, I found that the current data set in Seatunnel is represented by the SeatunnelRow type, which represents a row of data. However, due to the lack of table structure and overall table information during transform runtime, implementing transform plugins as per the documentation alone cannot fulfill the requirement for multi-table operations. Additionally, the Sql type of transform currently does not support clauses like join and group. In order to fulfill my requirements, I urgently need answers to the following questions: 1. Does Seatunnel have plans to introduce a data set type based on Table in the future? (Given the current logical structure of Seatunnel, it seems unlikely to achieve complex operations in transform, especially those involving partial metadata operations on tables.) 2. Is it difficult to perform secondary development on Seatunnel to implement transform operations on Table data sets without affecting or minimizing the impact on the current logical structure of Seatunnel? (Based on my observation of the source code, it may require implementing code for other data sets like FullTable, additional process control like SeaTunnelTask, FlowLifeCycle, etc.) 3. I noticed that the source code follows a high-level abstract structure from job submission to execution, while the lower-level implementation currently revolves around SeatunnelRow type for process control and lifecycle. Are there any plans to introduce other implementations at the lower level in the future? Thank you for your response! ### Chinese Description: 我有需求做一款ETL工具,需支持多表的交集、并集等操作,经过一段时间对seatunnel的源码调研,发现目前seatunnel的数据集为SeatunnelRow类型,表示一行数据,因transform运行时缺少表结构和整表信息等,所以直接按照文档实现transform插件并不能完成多表的操作需求,并且Sql类型的transform目前也不支持join、group等子句。为了完成我的需求,我迫切的需要知道以下几个问题的答案: 1. seatunnel未来有计划做以Table为类型的数据集吗(以seatunnel目前的逻辑结构,transform想要实现复杂的操作,尤其涉及到需要表的部分元数据的操作,似乎不太可能)? 2. 在seatunnel上做二次开发,实现对Table数据集的tranform操作,在不影响或尽量少影响seatunnel目前逻辑结构的情况下,实现困难吗(通过目前对源码的观察,可能需要实现代码如:其他的数据集FullTable、另外的流程控制SeaTunnelTask、FlowLifeCycle类型等)? 3. 我看到源码整体从job提交到执行的流程,发现上层结构很抽象,而下层目前为围绕SeatunnelRow类型的流程控制和生命周期等实现,是否后续下层有计划做其他实现? 感谢你的回答! ### Usage Scenario ### Usage Scenario English I need to develop an ETL tool that requires support for operations like union of multiple tables during the data transformation process. I intend to use seatunnel as the underlying framework and perform secondary development on top of it. ### Usage Scenario Chinese 我需要做一款ETL工具,中间的数据转换过程需支持多表并集等操作,想将seatunnel作为底层并在其上做二次开发 ### Related issues no ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
