Hi all,

I opened a ticket (https://github.com/apache/airflow/issues/24171) a while
back and I just want to make sure that it got stale deservedly :)

We used to have an issue with memory consumption on Airflow celery workers
where tasks were often killed by OOM killer. Most of our workload was
running Spark jobs in Yarn cluster mode using SparkSubmitHook. The main
driver for the high memory consumption were spark-submit processes, that
took about 500mb of memory each even though in yarn cluster mode they were
doing essentially nothing. We changed the hook to kill spark-submit process
right after Yarn accepts the application and track the status with "yarn
application -status" calls instead similar to how spark standalone mode is
being tracked right now and OOM issues went away.

It seems like an issue lots of other users with similar usage pattern
should probably be experiencing, unless they have unnecessarily large
memory allocated to Airflow workers. I want to know if anyone else has had
a similar experience. Is it worth it to work on including our fix in the
upstream repo? Or maybe everyone else has already switched to managed Spark
services and it's just us? :)
--
Tornike

Reply via email to