[GitHub] [airflow] blcksrx commented on issue #31183: SparkKubernetesSensor: 'None' has no attribute 'metadata'

via GitHub Mon, 05 Jun 2023 13:00:08 -0700


blcksrx commented on issue #31183:
URL: https://github.com/apache/airflow/issues/31183#issuecomment-1577392774


   > With regards to SparkKubernetesSensor, the Operator now remains Running 
for the duration of the Spark job, which (I think) means some DAG topologies 
will no longer work as expected. For example
   > 
   > ```
   >                 / ----> some other thing \
   > Spark Job                                          --> Final Thing
   >                 \ ----> sensor ----------- /
   > ```
   > 
   > Will no longer run "some other thing" while the Spark Job is running, but 
instead after the job is completed and the DAG will need to be re-arranged.
   > 
   > This will mean either:
   > 
   > 1. People will need to update their DAGs when they notice a runtime change 
or
   > 2. People will not notice the change, but will be running a useless 
SparkKubernetesSensor
   > 
   > This is why I would propose making this a separate operator, or behind 
behavioral flags on the existing operator to maintain the existing behavior 
while documenting an upgrade path.
   
   This example has a logical flaw, since the other thing is dependent to the 
`SparkJob`, if it wasn't dependent, it would be not related to the `SparkJob` 
and if it it was, it would be after the `SparkSensor`.
    Airflow operators are responsible to doing a job complete. such as moving 
data between data sources and mean while they keep their state as running and 
also keeps the logs in the airflow output.
   Hence I't doesn't makes any sense just to apply a spark job and ingore it or 
double the work by using another operator (sparkSensor) to wait on it.
   
   Despite of this fact it an additional argument with default value such as 
`print_logs=False` could solves these backward compatibility issue 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] blcksrx commented on issue #31183: SparkKubernetesSensor: 'None' has no attribute 'metadata'

Reply via email to