Great proposal! I have a few questions to understand.

1. If the same task is executed multiple times, will these jars be shared?
If a task ends, will it affect other tasks?

2. Can we cache these jars? Maybe the next task doesn't need to load again.


Looking forward to your reply.
--

Best Regards

------------

Liugddx
[email protected]


梁欢 <[email protected]> 于2023年7月3日周一 20:33写道:

> Hello everyone,When the Zeta engine submits a job, the client first loads
> the connector plugin locally and saves the absolute path of the connector
> JAR package and the third-party JAR package that the connector runtime
> depends on (such as the database driver package) in the logical execution
> plan of the job. After submitting the task to the Zeta engine server, the
> server obtains the paths of the required JAR packages for each task from
> the logical execution plan. It then uses these paths to load the JAR
> packages from the server and execute them.
>
>
>
>
> However, this approach has two significant limitations:
>
> The server needs to have all connectors and their dependent JAR packages.
>
> The installation path of the client must be exactly the same as the
> server, and the installation path of Seatunnel Zeta in all nodes must also
> be the same. This leads to the engine side of SeaTunnel Zeta being
> relatively heavy, and the container volume becoming very large when
> performing Docker or Kubernetes (K8S) submission tasks.
>
>
>
>
> To address these limitations, we need to optimize the logic of the Zeta
> engine when executing tasks. The server should only have the core JAR
> package of the engine, while all connector packages should reside on the
> client side. When submitting tasks, the client should upload the required
> JAR package to the server instead of just keeping the path of the JAR
> package. When the server executes a job, it should download the required
> JAR package and then load it. Once the job is completed, the JAR package
> can be deleted.
>
>
>
>
> In Docker or K8S mode, there is currently no unified JAR package
> management service provided for project requirements. This includes JAR
> packages for connectors and JAR packages that connectors depend on. To
> reduce container volume, only the framework package of the Zeta engine
> needs to be included in the container image. The JAR package of the
> connector and the third-party JAR package that the connector depends on can
> be separately uploaded to the pod for distribution. Therefore, a component
> that supports the upload and download of all JAR package files must be
> implemented on the JobMaster side. The client that submits the task is
> responsible for uploading the connector's JAR package and the third-party
> JAR package files that the connector depends on to this component for
> unified management. All TaskExecutors deployed on different containers are
> responsible for downloading the required JAR packages from this component.
> The service components on the JobMaster side need to ensure reliable file
> management until the completion of the Seatunnel task, by persisting JAR
> packages to local file systems or other distributed storage services such
> as HDFS or S3.
>
>
>
>
> The details of this feature design you guys can refer to [1].
>
>
>
>
> [1] https://github.com/apache/seatunnel/issues/5012
>
>
>
>
> Best wishes!
>
> Huan Liang

Reply via email to