potiuk commented on issue #67515: URL: https://github.com/apache/airflow/issues/67515#issuecomment-4559994135
Also another thing that you might learn from next time when you use Claude and do not understand what it does. The basic problem with your script (the one you Clauded) is that it measures importing **all** modules from all the packages in the provider. Which is completely nuts. No wonder that you get google as biggest offender - because it has the biggest number of modules and packages. This is **NOT** what is happening in Airflow when Dag is parsed and Task is executed. Not even close. Those are google provider stats: ``` Google provider (providers/google/src/): - 49 packages (directories with __init__.py) - 279 modules (.py files excluding __init__.py) - 328 total .py files ``` Yes... If you import 279 modules (which your AI -generated script does) - it can take a LOT of time. But if your Dag does: ``` from airflow.providers.google.cloud.dataproc import DataprocStartClusterOperator ``` It will load very few of those modules - all of them needed to get the right types, validate them, import classes that are needed as internal representation of classess needed to construct the objects etc. etc. Your measurements are not measuring savings you can get here - all those imports are most likely needed anyway to create DataprocStartClusterOperator. Your measurement actually show something different. They are not even showing the effect of making some imports lazy - because you have not checked which of those imports actually **can** be made lazy. You reports basically show what savings you can get if you load "all modules and all packages from the providers" vs. "not loading them at all". None of this is what is even close to any realistic things - either done by Airflow currently, nor anything that you can achieve by lazy imports. So I suggest you go back to the drawing board and ask your Claude to generate real measurement if you want to advocate for lazy loading idea for providers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
