vandonr-amz commented on PR #30495:
URL: https://github.com/apache/airflow/pull/30495#issuecomment-1499348530

   >     1. I think we will need some more detailed benchmarks
   
   Sure, would be happy to do so. I kinda lack a "representative" set of dags. 
Maybe I can see if I can use the example dags.
   I tested this mostly with breeze, is there other configurations you'd like 
to see in a benchmark ?
   
   >     2. I think there is a flaw in just reading lines from python code, 
there are - theorethically at lest - cases where imports are broken across 
multiple lines. Reading lines from python sounds a bit hack-ish (AST parsing 
would be a bit slower but much better).
   
   Yes, I was partly aware that parsing python by hand was hacky terrain, but 
also, thinking about it, I didn't really see examples that would make this code 
fail. Do you have an actual valid python code example in mind ?
   I'll take a look at AST parsing, thanks for the pointer.
   
   >     3. Should not it be better to import all airflow packages upfront 
(excluding providers maybe)? That sounds like a much more robust solution, you 
only do it once at the start of DAG file processor and you can even stop 
parsing DAG files at this point.
   
   Yes and no. Yes importing everything upfront would probably work (not 
tested), but it's quite a lot, I don't know if there are other drawbacks to it ?
   And if we don't import providers code, then we lose a good chunk of the 
performance gains from this PR because I imagine as soon as users are going to 
use operators, they're going to import provider packages, which is slow too.
   It'd still be better than before, but not as fast as it could be.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to