Hi Akash,

Thanks for starting the discussion.

Adding a bit to what Jens said here, a lot of the logging handler might
change for
AF 3. Although what you are looking at seems very similar to what fluentbit
/ fluentd offers,
have you explored that?

FluentBit: https://fluentbit.io/
FluentD: https://www.fluentd.org/

They also have capability for docker ecosystems (needn't have to be the
traditional K8s ecosystem).
Check here: https://www.fluentd.org/guides/recipes/docker-logging

A question about "which" component are you adding the handler in? I think
we would probably benefit from
a separate service to do this that probably shares a "common" volume with
the AF components? (Although look
out for the removal of direct DB access in AF 3)


Thanks & Regards,
Amogh Desai


On Sun, Mar 9, 2025 at 12:53 AM Akash Sharma <2akash111...@gmail.com> wrote:

> Hi Jens,
> The point here is that the solution should be setup agnostic i.e whether
> the tasks are being run in CeleryExecutor, K8sExecutor, or
> CeleryK8sOperator etc.. or whether the executors are reachable by the web
> server or not.
>
> Best regards,
> Akash
>
> On Sat, Mar 8, 2025 at 11:14 PM Jens Scheffler <j_scheff...@gmx.de.invalid
> >
> wrote:
>
> > Hi Akash,
> >
> > so for remote logging logs still can be sourced from the worker via the
> > web server if the endpoint hosted for this is reachable. Web server
> > attempts to source the logs from worker or local file system if not
> > found on remote. This is the standard for Celery for example.
> > Alternatively a shared log file system can be used and the webserver can
> > provide logs from there.
> >
> > In Airflow 3 (soon) there will be an enhanced way to ship logs while
> > in-flight.
> >
> > Otherwise if you hsot your workers remote and you don't get a network
> > connection from webserver to your worker, then you can take a look to
> > the new Edge Worker which also streams logs in chunks from the edge site
> > to the central location.
> >
> > If you otherwise want to contribute, helping hands are always welcome.
> > The log handler structure though will probably change in Airflow 3 soon.
> > Limitations of remote log storages for S3 / Azure Blob apply that you
> > can not append chunks.
> >
> > Jens
> >
> > On 08.03.25 16:49, Akash Sharma wrote:
> > > Hello everyone,
> > >
> > > Whenever remote logging is enabled, logs are only uploaded to the
> target
> > > path once the tasks have been completed. This makes it harder to
> monitor
> > > tasks that are long-running since there is no means to getting the
> logs.
> > >
> > > I was working on a Handler that saves the chunked logs where chunking
> is
> > > decided based on two factors -
> > >
> > >     1. Max time has elapsed since the last chunking was done
> > >     2. Max bytes have arrived since the last chunking was done
> > >
> > > So a chunk will be saved either when the max time has elapsed or the
> file
> > > size limit has been surpassed. The chunked files can then be uploaded
> > > whenever they are created and served by stitching them back together.
> > >
> > > Do let me know your thoughts.
> > >
> > > Best regards,
> > > Akash
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>

Reply via email to