[GitHub] [seatunnel] lianghuan-xatu opened a new issue, #5012: [ST-Engine][Design] Optimize SeaTunnel Zeta engine Jar package upload logic

via GitHub Mon, 03 Jul 2023 05:12:38 -0700


lianghuan-xatu opened a new issue, #5012:
URL: https://github.com/apache/seatunnel/issues/5012


   ### Search before asking
   
   - [X] I had searched in the 
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   
   ## Introduction
   We need to optimize the logic of the Zeta engine to execute the job. The 
server should only have the core jar package of the engine, and all the 
connector packages are on the Client side. When the job is submitted, the 
Client side should upload the required jar package to the server side, instead 
of just keeping the path of the jar package. When the server executes the job, 
it downloads the jar package required for the job and then loads it. After the 
job runs, the jar package is deleted.
   
   ## Overall Architecture
   <img src="https://imagehost-1311381309.cos.ap-nanjing.myqcloud.com/架构.png"/>
   
   ### ConnectorPackageServer: 
   
   - Capable of responding to third-party JAR package file upload and download 
requests for connectors and their dependencies. 
   
   - The local file system can read and write connector JAR package files.
   - The local file system reads and writes the third-party JAR package files 
that the connector depends on.
   - Supports HA-distributed file system read and write.
   - Responsible for cleaning up the local file system and distributed file 
system.
   - Download requests prioritize using files from the local file system for HA 
recovery, downloading files from the distributed system to the local file 
system. 
   
   ### ConnectorPackageClient: 
   - Caches local files in the ConnectorPackageServer for reading and writing.
   - Prioritizes using files from the local file system, then attempts to 
obtain them from the ConnectorPackageServert running on the JobMaster side, and 
finally attempts to download them from the HA distributed file system.
   - Responsible for cleaning the local file system.
   
   ### ConnectorPackageHAStorage: 
   ConnectorPackageServer only temporarily saves files without specialized 
persistence operations. To ensure successful recovery from checkpoints after 
exceptions occur during task execution, it is necessary to persist JAR Package 
to the local file system or other distributed storage services, such as HDFS, 
S3, etc. 
   
   ### ConnectorJarsLoader:
   ConnectorJarsLoader is a bridge between the JAR package of the connector 
obtained through component download, the JAR package that the connector depends 
on, and ClassLoader. These JAR package files are loaded into the path using the 
corresponding ClassLoader through ConnectorJarsLoader.
   
   
   
   ## Process Design
    <img src="https://imagehost-1311381309.cos.ap-nanjing.myqcloud.com/Job 
Submit Design.jpg"/>
   
   ### Uploads the JAR package files:
   During the process of creating a job execution environment, the seatunnel 
client searches for third-party JAR packages that the connector depends on in 
the SEATUNNEL_HOME/plugins directory. After parsing the job configuration file 
according to the plugin type, the unique identifier of each connector plugin 
needs to be passed to PluginDiscovery. PluginDiscovery will obtain the JAR 
package name prefix for the corresponding connector from the relevant 
configuration file (mapping.properties) based on the plugin's unique 
identifier. Finally, PluginDiscovery searches for the connector JAR package in 
the storage directory where the corresponding connector JAR package is saved, 
and uploads it to the Hazelcast main node before the job submission process.
   
   The file is first uploaded to JobMaster for local storage and then 
transferred to HA storage. Only after both are successfully written can the 
upload be confirmed (if HA is configured).
   
   ### JAR package file integrity verification
   By performing file verification during the file upload process, it is 
possible to increase the verification of the integrity of the uploaded file. 
This helps to ensure that the uploaded files are not tampered with or damaged 
during transmission and storage.
   
   ### Obtain JAR package files
   JobMaster will create a job scheduler to schedule and deploy physical node 
tasks in the job. The first step in deploying tasks through task execution 
services is to obtain the JAR package URL required for the current task 
execution from the job information. The second step is to determine whether the 
JAR package associated with the task exists locally. If there is no connector 
JAR package or third-party JAR package that the connector depends on in the 
local storage of the current executing node, then we need to obtain the 
corresponding connector JAR package for the current task and the third-party 
JAR package file that the connector depends on from JobMaster. Therefore, this 
step is the process of obtaining the JAR package on the execution end. 
Similarly, it is necessary to perform file integrity verification on the JAR 
package file obtained by the execution end. As long as the file integrity 
verification is successful, the next class loading process can proceed. After 
the class i
 s loaded, the execution service executes the task.
   
   ### JobMaster Reliability Assurance (HA)
   When the main node crashes, Hazelcast will automatically select another node 
as the master node. The re-selected master node will undergo the JobMaster 
reinitialization process again. In order to ensure the normal execution of 
previous jobs, we need to restore the locally stored job connector JAR package 
and the third-party JAR package that the connector relies on. We obtain the JAR 
files required for the job from the distributed file system and load them.
   
   ### Clear the JAR package files
   The cleaning work is carried out not only on the local file system of the 
Hazelcast master node running on JobMaster but also on the connector JAR 
packages associated with the job and the third-party JAR packages that the 
connector depends on in the distributed file system.
   
   ### Connector JAR file type
   Through the ConnectorPackageClient, you can download the JAR package of the 
connector, the JAR package that the connector depends on, and the user's JAR 
files from the service executed by JobMaster and store their JAR file content 
on a local disk. All JAR package files can be roughly divided into two types:
   - COMMON_PLUGIN_JAR
   - CONNECTOR_PLUGIN_JAR
   
   ## Abstract Interface
   <img 
src="https://imagehost-1311381309.cos.ap-nanjing.myqcloud.com/zeta1.drawio.png"/>
   
   This class implements a server that supports storing JAR package files for 
connectors, JAR files that connectors depend on. The server is responsible for 
listening to incoming requests and generating threads to process them. In 
addition, it is also responsible for creating directory structures that 
temporarily or permanently store JAR package files for connectors, as well as 
third-party JAR files that the connector depends on.
   
   <img 
src="https://imagehost-1311381309.cos.ap-nanjing.myqcloud.com/类图2.drawio.png"/>
   
   The ConnectorPackageClient can communicate with the ConnectorPackageServer, 
sending requests to upload files and download files to the 
ConnectorPackageServer.
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [seatunnel] lianghuan-xatu opened a new issue, #5012: [ST-Engine][Design] Optimize SeaTunnel Zeta engine Jar package upload logic

Reply via email to