potiuk commented on issue #21707: URL: https://github.com/apache/airflow/issues/21707#issuecomment-1050171698
> Python is inherently a bit slow to start up so don't expect any magic. Airflow though is a bit of an extra bad case though since it imports so many other modules. > > If you use the Docker image it is even slower (assume you do since using K8), because > > 1. The entrypoint performs some `airflow db check` before starting any tasks. Not sure why. This takes 5 seconds some times. This is explained in the docs: https://airflow.apache.org/docs/docker-stack/entrypoint.html#waits-for-airflow-db-connection > The entrypoint is waiting for a connection to the database independent of the database engine. This allows us to increase the stability of the environment. Alongside the documentation how to disable this check: ``` CONNECTION_CHECK_MAX_COUNT=0 ``` But you gave me thought that we can only run it for specific commands - so if you run airflow commands as "separate container" commands, this might help a bit. > 2. You loose the `.pyc`-caching since it starts a fresh container each time. I did some test long ago by pre-baking the .pyc files by simply ending the Dockerfile with `RUN airflow --help` and it shaved off almost a complete second on subsequent docker runs. Maybe i should upstream this fix to the official image? This is deliberate decision and baking in .pyc files is bad idea as it increases the size of the image significantly (you are basically trading of the size of the image, network, and storage with first time start for some commands. If you want to run airflow commands repetitively, rather than running new container every time, run a single comtainer and `exec` command in the running container. @Wats0ns : Airlfow version SHOULD be fast (als in terms of .pyc) because it imports very littler, so I also second @jedcunningham here - py-spy would be useful. I just checked my "airflow version" and I looked where the slowness might come from. Almost for sure those are your local_settings or your log configuration. Parsing settings.py (and local settings) and establshing logging configuration is the one thing that happens in `airflow version`. So please - run tha py-spy and let us know here by posting it (or mostl likely you will find where it comes from in your configuration). I am converting this one into a discussion, until we hear more about the py-spy results as this is likely not an airflow issue/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
