kaxil commented on issue #67515:
URL: https://github.com/apache/airflow/issues/67515#issuecomment-4548272652

   Coming in late. I'd like to push back on the framing of this sweep, and 
propose we wait until Python 3.15 support, which adds [PEP 
810](https://peps.python.org/pep-0810/) -- [lazy 
imports](https://docs.python.org/3.15/whatsnew/3.15.html#whatsnew315-lazy-imports).
   
   ### The core problem: most of this is noise
   
   An Operator uses a Hook. The Hook imports the SDK. When a task runs, all 
three get loaded.
   
   Moving `from neo4j import GraphDatabase` from `Neo4jHook` module-level into 
`_create_driver()` doesn't change that chain. It just moves *when* the import 
happens. The worker that runs the Neo4j task pays the import cost either way: 
before, at parse; after, at first execute. Total wall-clock, peak memory, pod 
size: identical.
   
   The scheduler also does not process raw Dag files, that's just factually 
incorrect in Airflow 3 land!
   
   The only processes that actually save are ones that import the hook module 
but never call into the SDK:
   
   | Process | Calls into SDK? | Save |
   |---|---|---|
   | Dag-processor | No, never executes tasks | ~60 MB RSS, ~1s |
   | Worker running a non-Neo4j task in a DAG that mentions Neo4jOperator | No 
| ~60 MB, ~1s |
   | Worker running the Neo4j task itself | Yes | **0** |
   
   On `KubernetesExecutor` (one task per pod), the third row is the common 
case. End users perceive no change. The dag-processor benefit is real but small.
   
   The issue's headline numbers (e.g. 550ms/168 MB for `google`) come from 
measuring `import the whole provider` in isolation. Real DAG files do `from 
airflow.providers.google.cloud.operators.bigquery import 
BigQueryInsertJobOperator` and pay a fraction of that. And the heaviest 
google.* imports (`DEFAULT_RETRY`, `QueryJob`, `Row`, `Conflict`) are runtime 
values, not type-only, so they can't move to `TYPE_CHECKING` without 
restructuring the operator.
   
   ### Fail-fast loss is a real regression
   
   Module-level imports surface install/version errors at parse time via the 
`import_error` table and the UI banner. Moving driver imports into 
`_create_driver()` defers the error to first task execution, where it lands in 
worker logs scattered across pods. For the realistic failure mode (version skew 
between provider and SDK extra), that's a UX regression we're paying for a 
saving end users don't see.
   
   ### PEP 810 is the right path
   
   PEP 810 (Explicit Lazy Imports) was accepted Nov 3, 2025, targeting Python 
3.15: https://peps.python.org/pep-0810/ & 
https://docs.python.org/3.15/whatsnew/3.15.html#whatsnew315-lazy-imports
   
   Two things we can do without method-internal refactors -- but let's wait 
until we support Python 3.15:
   
   1. **`__lazy_modules__` polyfill, safe today:**
      ```python
      __lazy_modules__ = ['neo4j']
      from neo4j import Driver, GraphDatabase  # unchanged
      ```
      Pythons <3.15 ignore the attribute. 3.15+ treats the imports as lazy. No 
mock-path changes, no inline imports, no fail-fast loss on current Pythons.
   
   2. **Global flag in dag-processor (when we adopt 3.15):** 
`PYTHON_LAZY_IMPORTS=all` plus `sys.set_lazy_imports_filter(func)`. Every 
provider becomes lazy-loaded with zero per-provider PRs.
   
   The keyword form (`lazy from neo4j import GraphDatabase`) isn't usable in 
provider source until we drop <3.15, realistically 2028+.
   
   ### What I'd suggest
   
   1. **Land `TYPE_CHECKING` guards on type-only imports.** Pure win, no 
trade-off.
   2. **Lazy-load only conditionally-used deps** -- e.g. `pandas_gbq` inside 
`BigQueryHook.get_pandas_df`. The fail-fast loss is scoped to "users who never 
call this method never discover the install is broken."
   3. **Skip the broad inline-import sweep.** The Operator-Hook-SDK chain means 
it's mostly cost-shifting, not cost-saving.
   4. **Plan dag-processor `-X lazy_imports=all` adoption** with the 3.15 
support cycle.
   
   Thanks to everyone who's already put work into this, the engineering on the 
PRs is fine, this is feedback on whether the broader campaign is worth doing. 
If someone has dag-processor RSS numbers from a representative parse load 
showing larger savings, I'd reconsider -- but isolated-import benchmarks aren't 
the right measurement.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to