roitvt commented on issue #7163: [AIRFLOW-6542] add spark-on-k8s 
operator/hook/sensor
URL: https://github.com/apache/airflow/pull/7163#issuecomment-595108351
 
 
   > Hello,
   > 
   > your pull request seems very interesting. Currently, I am using Airflow 
and Spark Operator on Kubernetes separately and this PR seems a very good 
solution to use them together.
   > 
   > Dealing with the code I would like to understand some points:
   > 
   > 1. Why are two different tasks used for creating and checking the status 
of Spark Application (**SparkKubernetesOperator** and 
**SparkKubernetesSensor**) ?
   >    Is it possible to merge them in a single Airflow operator ? In this way 
you can use only a single task in the DAG to manage a Spark Application.
   > 2. when a Custom resource is created/watched, I noticed the following part:
   > 
   > ```
   > group="sparkoperator.k8s.io",
   > version="v1beta2",
   > plural="sparkapplications",
   > ```
   > 
   > Why are these variables hard-coded here instead of retrieving them from 
the spark application definition (yaml) ?
   > 
   > Having a generic airflow operator that is able to manage all kinds of 
custom resources and not only the spark applications would be easier to re-use.
   > 
   > Thanks,
   > Alessio
   
   Thanks for your interest in my PR!!!
   1. I've seen the division to separate Operator and sensor also in the AWS 
EMR operator, and this is the way that we are currently working in my company, 
also the sensor has the built-in poke methods that check the condition and 
sleep. I know there is a way to call the sensor from the operator directly so  
let's say given a flag in the operator include_sensor=True after the creation 
done i'll call the sensor class. @mik-laj what is your opinion on this?
   2. I want to make this operator/sensor specific to sparkOperator so that's 
why I used hardcoded values so if you feeding the operator with the wrong YAML 
it won't pass. as for general CRD operator:
   I made a proposal on the mailing list: 
https://lists.apache.org/thread.html/r36fddadf0791718716829875e9a60674ef61a0ca598ed4e952d115fc%40%3Cdev.airflow.apache.org%3E
 and it seems that the right way is to make a more specific operator. but after 
this PR will pass I'll make another one with general CRD operator/sensor(the 
Kubernetes hook thanks to @mik-laj already has general CRD methods) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to