potiuk commented on issue #21707:
URL: https://github.com/apache/airflow/issues/21707#issuecomment-1050171698


   > Python is inherently a bit slow to start up so don't expect any magic. 
Airflow though is a bit of an extra bad case though since it imports so many 
other modules.
   > 
   > If you use the Docker image it is even slower (assume you do since using 
K8), because
   > 
   > 1. The entrypoint performs some `airflow db check` before starting any 
tasks.  Not sure why. This takes 5 seconds some times.
   
   This is explained in the docs: 
   
https://airflow.apache.org/docs/docker-stack/entrypoint.html#waits-for-airflow-db-connection
   
   > The entrypoint is waiting for a connection to the database independent of 
the database engine. This allows us to increase the stability of the 
environment.
   
   Alongside the documentation how to disable this check:
   ```
   CONNECTION_CHECK_MAX_COUNT=0
   ```
   
   But you gave me thought that we can only run it for specific commands - so 
if you run airflow commands as "separate container" commands, this might help a 
bit.
   
   > 2. You loose the `.pyc`-caching since it starts a fresh container each 
time. I did some test long ago by pre-baking the .pyc files by simply ending 
the Dockerfile with `RUN airflow --help` and it shaved off almost a complete 
second on subsequent docker runs. Maybe i should upstream this fix to the 
official image?
   
   This is deliberate decision and baking in .pyc files is bad idea as it 
increases the size of the image significantly (you are basically trading of the 
size of the image, network, and storage with first time start for some commands.
   
   If you want to run airflow commands repetitively, rather than running new 
container every time, run a single comtainer and `exec` command in the running 
container.
   
   
   @Wats0ns : Airlfow version SHOULD be fast (als in terms of .pyc) because it 
imports very littler, so I also second @jedcunningham here - py-spy would be 
useful. I just checked my "airflow version" and I looked where the slowness 
might come from.
   
   Almost for sure those are your local_settings or your log configuration. 
Parsing settings.py (and local settings) and establshing logging configuration 
is the one thing that happens in `airflow version`.
   
   So please - run tha py-spy and let us know here by posting it (or mostl 
likely you will find where it comes from in your configuration).
   
   I am converting this one into a discussion, until we hear more about the 
py-spy results as this is likely not an airflow issue/ 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to