jason810496 commented on PR #65958:
URL: https://github.com/apache/airflow/pull/65958#issuecomment-4539301052

   > Can you explain the overhead here? I think python dag parsers have the 
same overhead.
   
   Before the Python DAG parsers actually import each file and start the 
parsing process, the parser does some fast filtering based on file extension 
and file content. For example, `list_py_file_paths` and `might_contain_dag` can 
filter out a large number of invalid bundles up front without any import or 
subprocess spawning, they're just plain file I/O.
   
   
https://github.com/apache/airflow/blob/0a2f1810aa32cfb0e47ce452779c76f437a2214a/airflow-core/src/airflow/dag_processing/manager.py#L859-L866
   
https://github.com/apache/airflow/blob/0a2f1810aa32cfb0e47ce452779c76f437a2214a/airflow-core/src/airflow/utils/file.py#L123-L161
   
   In the case of the Java-SDK, as far as I know of the JVM ecosystem, there 
are two common ways to deploy JARs (please correct me if I'm wrong):
   
   1. A fat JAR (a.k.a. uber JAR) containing all the dependencies (the actual 
DAG definition modules, the Java-SDK itself, third-party libraries, etc.).
   2. Each module shipped as its own JAR, so that users can upgrade or swap 
individual JARs as needed.
   
   Both are valid deployment approaches, and for the latter one we should 
support file-based discovery to avoid the overhead of loading every JAR just to 
see whether it actually contains a DAG.
   
   > Do you think a specialized build tool will add friction for user adoption? 
Will there be a simpler way for people who dont want to use specialized?
   
   IMHO, we can't really avoid introducing a build tool for compiled languages 
in the long term if we want to support further features. In the case of the 
Java-SDK, users already need to add the Airflow-Java-SDK dependency in Gradle, 
so it _should_ be fine to ask them to add one more line for the corresponding 
Gradle plugin as part of the Day-1 user education.
   
   For interpreted languages (or any language where the bundle itself is 
human-readable), we might have a chance to skip the build tool, since we could 
implement file-based filters similar to `list_py_file_paths` and 
`might_contain_dag` shown above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to