Hi community,

Related issue: https://github.com/apache/dolphinscheduler/issues/13017

Currently, DS only supports writing task logs to the local file system in
worker. So this issue discusses the feature design of remote logging.

# Why remote logging?
* Avoid task log loss after worker is torn down
* Easier to obtain logs and troubleshoot after logs are aggregated in
remote storage
* Enhanced cloud-native support for DS

# Feature Design

## Connect to different remote targets
DS can support a variety of common remote storage, and has strong
scalability to support other types of remote storage
* S3
* OSS
* ElasticSearch
* Azure Blob Storage
* Google Cloud Storage
* ...

## When to write logs to remote storage
Like airflow, DS writes the task logs to remote storage after the task
completes (success or fail).

## How to read logs
Since the task log is stored in both the worker's local and remote storage,
when the `api-server` needs to read the log of a certain task instance, it
needs to determine the reading strategy.

Airflow first tries to read the logs stored remotely, and if it fails,
reads the local logs. But I prefer to try to read the local log first, and
then read the remote log if the local log file does not exist.

We could discuss this further.

## Log retention strategy

For example, the maximum capacity of remote storage can be set, and old
logs can be deleted by rolling.

# Sub-tasks
WIP

Any comments or suggestions are welcome.

Best Regards,
Rick Cheng

Reply via email to