jason810496 commented on PR #65958: URL: https://github.com/apache/airflow/pull/65958#issuecomment-4539301052
> Can you explain the overhead here? I think python dag parsers have the same overhead. Before the Python DAG parsers actually import each file and start the parsing process, the parser does some fast filtering based on file extension and file content. For example, `list_py_file_paths` and `might_contain_dag` can filter out a large number of invalid bundles up front without any import or subprocess spawning, they're just plain file I/O. https://github.com/apache/airflow/blob/0a2f1810aa32cfb0e47ce452779c76f437a2214a/airflow-core/src/airflow/dag_processing/manager.py#L859-L866 https://github.com/apache/airflow/blob/0a2f1810aa32cfb0e47ce452779c76f437a2214a/airflow-core/src/airflow/utils/file.py#L123-L161 In the case of the Java-SDK, as far as I know of the JVM ecosystem, there are two common ways to deploy JARs (please correct me if I'm wrong): 1. A fat JAR (a.k.a. uber JAR) containing all the dependencies (the actual DAG definition modules, the Java-SDK itself, third-party libraries, etc.). 2. Each module shipped as its own JAR, so that users can upgrade or swap individual JARs as needed. Both are valid deployment approaches, and for the latter one we should support file-based discovery to avoid the overhead of loading every JAR just to see whether it actually contains a DAG. > Do you think a specialized build tool will add friction for user adoption? Will there be a simpler way for people who dont want to use specialized? IMHO, we can't really avoid introducing a build tool for compiled languages in the long term if we want to support further features. In the case of the Java-SDK, users already need to add the Airflow-Java-SDK dependency in Gradle, so it _should_ be fine to ask them to add one more line for the corresponding Gradle plugin as part of the Day-1 user education. For interpreted languages (or any language where the bundle itself is human-readable), we might have a chance to skip the build tool, since we could implement file-based filters similar to `list_py_file_paths` and `might_contain_dag` shown above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
