Hello all,

I'm currently setting up a prototype of Airflow with the Celery executor.
When a Celery worker is run as the root user, it crashes with an error
about running as root with pickling enabled being insecure, etc (reference
<http://docs.celeryproject.org/en/latest/whatsnew-3.1.html?highlight=root%20user%20with%20pickle#in-other-news>
).

We can fix that by running our docker container as non-root, but I'd also
like to remove pickle from Celery's accept_content setting altogether.  I
know most Celery settings can be overridden in airflow.cfg, such as
broker_url, but this doesn't seem similarly possible for accept_content
which is hardcoded.

https://github.com/apache/incubator-airflow/blob/16b5f9a196b5b1849c162005e1254a0ba6f45893/airflow/config_templates/default_celery.py#L28

I noticed that I can override celery_config_options in airflow.cfg to point
to a different dict besides default_celery.DEFAULT_CELERY_CONFIG, then in
an overridden dict, I could change the value to:

{
    'accept_content': ['json'],
    ...
}

But is there a particular reason some Celery settings are overridden in
airflow.cfg directly vs indirectly like this?

By the way, it looks like the default accept_content setting in Celery has
changed to be json only with Celery 4.0 (references 1
<http://docs.celeryproject.org/en/latest/whatsnew-3.1.html?highlight=root%20user%20with%20pickle#last-version-to-enable-pickle-by-default>,
2
<http://docs.celeryproject.org/en/latest/userguide/configuration.html#accept-content>
).

Given the plan to remove pickling in Airflow 2.0, perhaps it makes sense to
change the Airflow default to match as well?

Would it be worthwhile for me to submit a PR that reads the other Celery
settings using configuration.get(...) and/or updates this default value?

Thank you,
Taylor Edmiston

Reply via email to