Hi,

Based on testing that was performed around gang scheduling and the spark
operator by Bowen Li and Chaoran Yu we found that the behaviour around the
operator was far from optimal. YUNIKORN-558
<https://issues.apache.org/jira/browse/YUNIKORN-558> was logged to help
with the integration.
We did not put any development or test time into making sure the operator
and gang scheduling worked. The behaviour that was observed was not linked
to gang scheduling but to the generic way the operator implementation works
in YuniKorn.

The current Spark operator, implemented
in pkg/appmgmt/sparkoperator/spark.go, listens to the Spark CRD
add/update/delete. Each CRD is then converted into an application inside
YuniKorn and processed. The pods created by the Spark operator form the
other half of the application. However the CRD has its own application ID.
The application ID for the Spark pods (drivers and executors) is different.

This leaves us with two applications in the system: one without pods (CRD
based) and one with pods (the real workload). The real workload pods have
an owner reference set to the CRD. Having two applications for one real
workload is strange. It does not work correctly in the UI and gives all
kinds of issues on completion and recovery on restart.

The proposal is now to merge the two objects into one application inside
YuniKorn. The CRD can still be used to track updates and provide events for
scheduling etc. The "ApplicationID" set in the driver or executor pods
should be used to track this application.
The owner reference allows linking the real pods back to the CRD. The CRD
will be used to provide the life cycle tracking and as an event collector.

All these changes do require rework on the app management side. I hope the
proposal sounds like the correct way forward. This same CRD based mechanism
also seems to fit in with the way the flink operator works.
Please provide some feedback on this proposal. Implementation would require
changes in app management and related unit tests. Recovery and gang
scheduling tests should also be covered under this change.

Wilfred

Reply via email to