mik-laj opened a new pull request #7878: Load all jobs models at once
URL: https://github.com/apache/airflow/pull/7878
 
 
   We use model polymorphism in `airflow.jobs` classes, so we must load all 
classes each time. 
   
https://github.com/apache/airflow/blob/master/airflow/jobs/base_job.py#L65-L73
   Otherwise, SQLAlchemy will not be able to execute the queries correctly.
   So, in order not to load a lot of code, we should divide these classes into 
smaller ones. I think it won't be a problem to separate the loops 
(BackfillLoop, SchedulerLoop, LocalTaskJob) from the entity model. However, I 
would like to do it later. For now I am proposing a temporary solution. Loading 
all classes every time.
   
   This problem does not appear in CI tests, because then we load all classes. 
It also does not occur in real situations, because the current coupling of all 
classes causes that these models are loaded anyway, but it blocks 
refactoring.This makes it difficult to eliminate coupling between classes. In 
particular, it prevents me from completing this change.
   https://github.com/apache/airflow/pull/7806
   The problem sometimes occurs when running individual tests.
   
   I'm using the following code to check the impact of this change on Airflow.
   ```python
   import sys
   from contextlib import contextmanager
   
   
   class TraceImportResult:
       def __init__(self):
           self.new_modules = None
   
       def print_new_modules(self):
           for i, new_module in enumerate(sorted(trace_resuslt.new_modules)):
               print(f"{i}. {new_module}")
   
   
   @contextmanager
   def trace_imports():
       before_modules = set(sys.modules.keys())
       result = TraceImportResult()
   
       try:
           yield result
       finally:
           after_modules = set(sys.modules.keys())
           before_count = len(before_modules)
           after_count = len(after_modules)
           diff_count = after_count - before_count
           result.new_modules = after_modules - before_modules
           print(f"Modules loaded - before: {before_count}")
           print(f"Modules loaded - after:  {after_count}")
           print(f"Modules loaded - diff:   {diff_count}")
   
   
   if __name__ == "__main__":
       # Example:
   
       with trace_imports() as trace_resuslt:
           import airflow  # noqa # pylint: disable=unused-import
   
       trace_resuslt.print_new_modules()
       with trace_imports() as trace_resuslt:
           import airflow.jobs  # noqa # pylint: disable=unused-import
   
       trace_resuslt.print_new_modules()
   ```
   
   We learn from this that adding these imports will cause that an additional 
32 modules will be loaded each time the code `import airflow.jobs` is executed.
   ```
   0. airflow.executors
   1. airflow.executors.base_executor
   2. airflow.executors.executor_loader
   3. airflow.executors.local_executor
   4. airflow.executors.sequential_executor
   5. airflow.jobs
   6. airflow.jobs.backfill_job
   7. airflow.jobs.base_job
   8. airflow.jobs.local_task_job
   9. airflow.jobs.scheduler_job
   10. airflow.task
   11. airflow.task.task_runner
   12. airflow.task.task_runner.base_task_runner
   13. airflow.task.task_runner.standard_task_runner
   14. airflow.utils.asciiart
   15. airflow.utils.configuration
   16. airflow.utils.dag_processing
   17. airflow.utils.process_utils
   18. glob
   19. lockfile
   20. lockfile.linklockfile
   21. lockfile.pidlockfile
   22. multiprocessing.managers
   23. multiprocessing.pool
   24. psutil
   25. psutil._common
   26. psutil._compat
   27. psutil._psosx
   28. psutil._psposix
   29. psutil._psutil_osx
   30. psutil._psutil_posix
   31. setproctitle
   ```
   
   Without this change, loading individual modules will load the following 
amount of new modules.
   ``import airflow.jobs.base_job `` - 5
   ``import airflow.jobs.scheduler_job `` - 31
   ``import airflow.jobs.backfill_job`` - 11
   ``mport airflow.jobs.local_task_job`` - 24
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [ ] Description above provides context of the change
   - [ ] Unit tests coverage for changes (not needed for documentation changes)
   - [ ] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [ ] Relevant documentation is updated including usage instructions.
   - [ ] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to