Hi Akash,

so for remote logging logs still can be sourced from the worker via the
web server if the endpoint hosted for this is reachable. Web server
attempts to source the logs from worker or local file system if not
found on remote. This is the standard for Celery for example.
Alternatively a shared log file system can be used and the webserver can
provide logs from there.

In Airflow 3 (soon) there will be an enhanced way to ship logs while
in-flight.

Otherwise if you hsot your workers remote and you don't get a network
connection from webserver to your worker, then you can take a look to
the new Edge Worker which also streams logs in chunks from the edge site
to the central location.

If you otherwise want to contribute, helping hands are always welcome.
The log handler structure though will probably change in Airflow 3 soon.
Limitations of remote log storages for S3 / Azure Blob apply that you
can not append chunks.

Jens

On 08.03.25 16:49, Akash Sharma wrote:
Hello everyone,

Whenever remote logging is enabled, logs are only uploaded to the target
path once the tasks have been completed. This makes it harder to monitor
tasks that are long-running since there is no means to getting the logs.

I was working on a Handler that saves the chunked logs where chunking is
decided based on two factors -

    1. Max time has elapsed since the last chunking was done
    2. Max bytes have arrived since the last chunking was done

So a chunk will be saved either when the max time has elapsed or the file
size limit has been surpassed. The chunked files can then be uploaded
whenever they are created and served by stitching them back together.

Do let me know your thoughts.

Best regards,
Akash


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to