lianghuan-xatu opened a new issue, #5012: URL: https://github.com/apache/seatunnel/issues/5012
### Search before asking - [X] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description ## Introduction We need to optimize the logic of the Zeta engine to execute the job. The server should only have the core jar package of the engine, and all the connector packages are on the Client side. When the job is submitted, the Client side should upload the required jar package to the server side, instead of just keeping the path of the jar package. When the server executes the job, it downloads the jar package required for the job and then loads it. After the job runs, the jar package is deleted. ## Overall Architecture <img src="https://imagehost-1311381309.cos.ap-nanjing.myqcloud.com/架构.png"/> ### ConnectorPackageServer: - Capable of responding to third-party JAR package file upload and download requests for connectors and their dependencies. - The local file system can read and write connector JAR package files. - The local file system reads and writes the third-party JAR package files that the connector depends on. - Supports HA-distributed file system read and write. - Responsible for cleaning up the local file system and distributed file system. - Download requests prioritize using files from the local file system for HA recovery, downloading files from the distributed system to the local file system. ### ConnectorPackageClient: - Caches local files in the ConnectorPackageServer for reading and writing. - Prioritizes using files from the local file system, then attempts to obtain them from the ConnectorPackageServert running on the JobMaster side, and finally attempts to download them from the HA distributed file system. - Responsible for cleaning the local file system. ### ConnectorPackageHAStorage: ConnectorPackageServer only temporarily saves files without specialized persistence operations. To ensure successful recovery from checkpoints after exceptions occur during task execution, it is necessary to persist JAR Package to the local file system or other distributed storage services, such as HDFS, S3, etc. ### ConnectorJarsLoader: ConnectorJarsLoader is a bridge between the JAR package of the connector obtained through component download, the JAR package that the connector depends on, and ClassLoader. These JAR package files are loaded into the path using the corresponding ClassLoader through ConnectorJarsLoader. ## Process Design <img src="https://imagehost-1311381309.cos.ap-nanjing.myqcloud.com/Job Submit Design.jpg"/> ### Uploads the JAR package files: During the process of creating a job execution environment, the seatunnel client searches for third-party JAR packages that the connector depends on in the SEATUNNEL_HOME/plugins directory. After parsing the job configuration file according to the plugin type, the unique identifier of each connector plugin needs to be passed to PluginDiscovery. PluginDiscovery will obtain the JAR package name prefix for the corresponding connector from the relevant configuration file (mapping.properties) based on the plugin's unique identifier. Finally, PluginDiscovery searches for the connector JAR package in the storage directory where the corresponding connector JAR package is saved, and uploads it to the Hazelcast main node before the job submission process. The file is first uploaded to JobMaster for local storage and then transferred to HA storage. Only after both are successfully written can the upload be confirmed (if HA is configured). ### JAR package file integrity verification By performing file verification during the file upload process, it is possible to increase the verification of the integrity of the uploaded file. This helps to ensure that the uploaded files are not tampered with or damaged during transmission and storage. ### Obtain JAR package files JobMaster will create a job scheduler to schedule and deploy physical node tasks in the job. The first step in deploying tasks through task execution services is to obtain the JAR package URL required for the current task execution from the job information. The second step is to determine whether the JAR package associated with the task exists locally. If there is no connector JAR package or third-party JAR package that the connector depends on in the local storage of the current executing node, then we need to obtain the corresponding connector JAR package for the current task and the third-party JAR package file that the connector depends on from JobMaster. Therefore, this step is the process of obtaining the JAR package on the execution end. Similarly, it is necessary to perform file integrity verification on the JAR package file obtained by the execution end. As long as the file integrity verification is successful, the next class loading process can proceed. After the class i s loaded, the execution service executes the task. ### JobMaster Reliability Assurance (HA) When the main node crashes, Hazelcast will automatically select another node as the master node. The re-selected master node will undergo the JobMaster reinitialization process again. In order to ensure the normal execution of previous jobs, we need to restore the locally stored job connector JAR package and the third-party JAR package that the connector relies on. We obtain the JAR files required for the job from the distributed file system and load them. ### Clear the JAR package files The cleaning work is carried out not only on the local file system of the Hazelcast master node running on JobMaster but also on the connector JAR packages associated with the job and the third-party JAR packages that the connector depends on in the distributed file system. ### Connector JAR file type Through the ConnectorPackageClient, you can download the JAR package of the connector, the JAR package that the connector depends on, and the user's JAR files from the service executed by JobMaster and store their JAR file content on a local disk. All JAR package files can be roughly divided into two types: - COMMON_PLUGIN_JAR - CONNECTOR_PLUGIN_JAR ## Abstract Interface <img src="https://imagehost-1311381309.cos.ap-nanjing.myqcloud.com/zeta1.drawio.png"/> This class implements a server that supports storing JAR package files for connectors, JAR files that connectors depend on. The server is responsible for listening to incoming requests and generating threads to process them. In addition, it is also responsible for creating directory structures that temporarily or permanently store JAR package files for connectors, as well as third-party JAR files that the connector depends on. <img src="https://imagehost-1311381309.cos.ap-nanjing.myqcloud.com/类图2.drawio.png"/> The ConnectorPackageClient can communicate with the ConnectorPackageServer, sending requests to upload files and download files to the ConnectorPackageServer. ### Usage Scenario _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
