I think we should make it then explicit in the proposal. It's not spelled out currently.
On Sat, May 9, 2026 at 9:23 PM Tzu-ping Chung <[email protected]> wrote: > > On 10 May 2026, at 02:35, Jarek Potiuk <[email protected]> wrote: > > > Coordinators are Python. They are imported into Airflow. Not separate > processes. tasks run in Python, and the coordinator knows how to talk to > them. How the messages are exchanged (not the messages themselves) is > purely between the coordinator (Python) and Java is only between them, and > the same goes for other language coordinator-SDK. It is not public and thus > not needed for specification. > > Oh I think that's a misunderstanding then. I do understand that the > "coordinator" part runs in Python on Scheduler. I think it's more about > where the process executes, and whether it is long-running or started for > every task (and where). > Just to compare the two executors: > > * LocalExecutor: The process for the task runs on the same "machine" > (container etc.) as the Scheduler. > * CeleryExecutor: The process runs somewhere from which the Celery worker > pulls it from the queue > > So my assumption (and correct me if I am wrong or if this is > already explained and yeah - I missed it ): > > DagProcessor with Java Coordinator: > a) A new Java process will start every time a new DAG is created.Is the > Java file about to be parsed? > > > One process for one round to parse one file. Same as Python dag files. (In > some sense it’s more like the compiled JAR “parses itself” and returns the > result.) > > > b) Or will a long-running Java process run locally that the DagProcessor > will communicate with? > > Both approaches are possible, each has different characteristics > (performance, cachine, warm-up time, JIT, potential pollution between > several independent Dags parsed by the same DagFileProcessor). > > > Correct, but from my research JVM does not make this reusing a process > easy without serious restriction to what libraries people can use, and/or > how global states can be manipulated. All in all not suitable for using > Java in Airflow IMO. > > I am not ruling out the possibility though (my understanding to the JVM > ecosystem has a lot of gaps, to put it mildly), so the AIP also does not > really commit either way. It doesn’t need to though since whether to launch > long-running worker processes is still entirely within the Java > coordinator, and does not require a public interface change. > > Approach b also may make sense for some other runtimes, but that’s out of > the scope for now. > > > > Task Execution with Java Coordinator: > > a) Will all Java tasks run locally as a new Java process on the same > machine (container/machine) as the scheduler? > > b) Or will there be a long-running Java process (like in DagFileProcessor) > that the Scheduler communicates with to execute the task? > c) Or will it depend on the Executor? If we use CeleryExecutor with a Java > Coordinator, does that mean the executing task will run a Python task on > Celery worker and that Python task will create Java process? > d) Or, is there a long-running Java process started on the Celery worker > in this case? > e) How about Edge Executor? Same question regarding long-running versus > new processes? > > > It is run on the same environment as if the task is written in Python. > Where that is depends on the executor. > > > > I was under the impression that one of the promises of "run all languages, > everywhere" was that we could have a standalone "language" component > running tasks in a given language—somewhere—that executes individual tasks > without the overhead of "python interpreter" starting every time the task > is started. This is what the talk "Run Airflow Tasks on a Coffee Machine" > basically promised: > https://airflowsummit.org/sessions/2025/run-airflow-tasks-on-your-coffee-machine > - and this is what the "edge" executor and its API promised to provide > "eventually". And we might depart from this vision... But I think a diagram > whene we see processes running (Python / Java) and how long they are > running (new process/interpreter per task - or long running) would be > useful to understand what we are proposing here. > > > Having your coffee machine able to run Airflow tasks is a great idea, but > not something needed by most prospective Airflow users and existing Airflow > users that want to leverage Java. Those people want existing Airflow things > to work with Java, and running Python with their Java program is the best > way to achieve this. On the other hand, your coffee machine can’t afford to > run a Python interpreter, but probably also doesn’t need its own secret > backend or custom XCom. > > This is also why the AIP makes no mention of the Edge Executor, while the > linked talk considers it one of the big components. These are seemingly > similar, but ultimately very different goals, and require different > solutions. This AIP does not attempt to only solve one of them. > > > > > > > >
