Hello everyone,

It happens more and more often (and seems that it is pretty
widespread among "small" users of Airflow using Docker Compose and official
Airflow image that they are using _PIP_ADDITIONAL_REQUIREMENTS for
production usage - even if building and hosting your modified image is as
easy as copy & pasting few lines to a Dockerfile and running `docker build`
following by `docker push.

While I understand (and that's why we have it from the very beginning) why
you would like to use it for local quick iteration and testing, using it
for anything close to production has a lot of negative consequences and
leads to mysterious errors that take a lot of time for those users who are
trying to investigating some dependency problems, slow start of containers,
race conditions and it makes them prone to security issues.

I thought about whether we should remove the feature completely (it is
clearly marked as "do not use it for anything in production - so we could).
But I thought that a better solution could be to make the default image of
ours simply unusable for production use if you use the flag.

So I came up with this proposed PR:
https://github.com/apache/airflow/pull/33879

It has a few protections against other side effects (such as accidentally
downgrading or upgrading airflow by adding conflicting requirements, and
verification if the dependencies pass `pip check`) - but the gist of it -
the container will shut down automatically after 10 minutes if it has been
started with _PIP_ADDITIONAL_REQUIREMENTS (and will print instructions what
to do - how to build your image - to avoid it). And we have much more
comprehensive and detailed instructions on how to build your image and even
how to integrate the "--build" docker-compose workflow to do the same
following the way docker-compose was designed.

Yes. It is super aggressive. Yes it will break a number of user's workflows
(especially when they are using 3rd-party charts that encourage using
_PIP_ADDITIONAL_REQUIREMENTS) - part of the problem is that some charts out
there are exposing this as a "production functionality" despite it not
being one.

But IMHO - using the variable in production does more harm than good and it
has very easy alternatives, which users don't use because they choose
shortcuts without realising the consequences - which is not the way you
should treat a production environment.

Also restarting the container after 10 minutes is pretty consistent with
the usage that _PIP_ADDITIONAL_REQUIRENENTS has been added for. Restarting
the container in a quick test/dependency iteration environment regularly
could even be seen as a "feature" if you approach the "formal" status of
this variable.

Also this has a bit of a "sneaky" way of tricking people into
"good practices" learning. In order to avoid the restart you really need to
.... build your own image ... So this has also an educational aspect -
people will have to learn about building the image anyway if they will want
to learn how to "not build their images" and use something like the
variable again and it will be even more complex than just adding the
requirements to the custom image - so they will literally have to learn the
"good practice".

It is aggressive, yes. I wonder what you think. Is it too aggressive?
Should we do it?

Maybe there are really good reasons why it's impossible to use your own
custom images other than the extra step and  annoyance of being able to
push your images to some - public (free) or private (internal) registry. I
can't imagine how someone deploying docker-compose or k8s does not have the
way of having their own images available. Even if it means . that you have
to go through some kind of infra of yours you generally should do for
production.

Looking forward to your comments :D

J.

Reply via email to