Hi community,

Here are some discussions on the weekly meeting about this feature:

**Q1: In the k8s environment, users can choose to mount persistent volumes
(E.g., [OSS](https://help.aliyun.com/document_detail/130911.html)) to
synchronize task logs to remote storage.**
R1: This is indeed a way to synchronize logs to remote storage, and only
need to mount persistent volumes (PV). But there are still some
shortcomings as below:
* **Efficiency**: Since the PV is connected to the remote storage, the
speed of log writing will be reduced, which will further affect the task
execution of the worker. On the contrary, uploading the task log to the
remote storage asynchronously through the remote logging mechanism will not
affect the execution of the task.
* **Generality**: PV is not suitable for some remote storage, such as
elasticsearch. And also it is not applicable to DS deployed in non-k8s
environment.

**Q2: Users can configure whether to use remote storage for task logs**
R2: Yes, users can decide whether to enable log remote storage through
configuration, and specify the corresponding configuration of remote
logging.

**Q3: The master-server also has task logs, which need to be uploaded to
remote storage in a unified manner.**
R3: Yes, users can set the master's task log related remote storage
configuration in Master's configuration.

**Q4: Is it possible to set the task log retention policy through the
configuration supported by the remote storage itself?**
R4: This is a good idea and it can simplify the design of remote logging,
I'll look into it.

Related issue: https://github.com/apache/dolphinscheduler/issues/13017

Thanks again for all the suggestions at the weekly meeting, please correct
me if I'm wrong.

Best Regards,
Rick Cheng


Rick Cheng <[email protected]> 于2022年11月28日周一 13:24写道:

> Hi community,
>
> Related issue: https://github.com/apache/dolphinscheduler/issues/13017
>
> Currently, DS only supports writing task logs to the local file system in
> worker. So this issue discusses the feature design of remote logging.
>
> # Why remote logging?
> * Avoid task log loss after worker is torn down
> * Easier to obtain logs and troubleshoot after logs are aggregated in
> remote storage
> * Enhanced cloud-native support for DS
>
> # Feature Design
>
> ## Connect to different remote targets
> DS can support a variety of common remote storage, and has strong
> scalability to support other types of remote storage
> * S3
> * OSS
> * ElasticSearch
> * Azure Blob Storage
> * Google Cloud Storage
> * ...
>
> ## When to write logs to remote storage
> Like airflow, DS writes the task logs to remote storage after the task
> completes (success or fail).
>
> ## How to read logs
> Since the task log is stored in both the worker's local and remote
> storage, when the `api-server` needs to read the log of a certain task
> instance, it needs to determine the reading strategy.
>
> Airflow first tries to read the logs stored remotely, and if it fails,
> reads the local logs. But I prefer to try to read the local log first, and
> then read the remote log if the local log file does not exist.
>
> We could discuss this further.
>
> ## Log retention strategy
>
> For example, the maximum capacity of remote storage can be set, and old
> logs can be deleted by rolling.
>
> # Sub-tasks
> WIP
>
> Any comments or suggestions are welcome.
>
> Best Regards,
> Rick Cheng
>

Reply via email to