blcksrx commented on issue #31183:
URL: https://github.com/apache/airflow/issues/31183#issuecomment-1577392774
> With regards to SparkKubernetesSensor, the Operator now remains Running
for the duration of the Spark job, which (I think) means some DAG topologies
will no longer work as expected. For example
>
> ```
> / ----> some other thing \
> Spark Job --> Final Thing
> \ ----> sensor ----------- /
> ```
>
> Will no longer run "some other thing" while the Spark Job is running, but
instead after the job is completed and the DAG will need to be re-arranged.
>
> This will mean either:
>
> 1. People will need to update their DAGs when they notice a runtime change
or
> 2. People will not notice the change, but will be running a useless
SparkKubernetesSensor
>
> This is why I would propose making this a separate operator, or behind
behavioral flags on the existing operator to maintain the existing behavior
while documenting an upgrade path.
This example has a logical flaw, since the other thing is dependent to the
`SparkJob`, if it wasn't dependent, it would be not related to the `SparkJob`
and if it it was, it would be after the `SparkSensor`.
Airflow operators are responsible to doing a job complete. such as moving
data between data sources and mean while they keep their state as running and
also keeps the logs in the airflow output.
Hence I't doesn't makes any sense just to apply a spark job and ingore it or
double the work by using another operator (sparkSensor) to wait on it.
Despite of this fact it an additional argument with default value such as
`print_logs=False` could solves these backward compatibility issue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]