wanderijames edited a comment on pull request #17178: URL: https://github.com/apache/airflow/pull/17178#issuecomment-885423290
> We currently have one operator that allows us to run Spark job on Kubernetes. It works with both EKS and GCP as well as any other Kubernetes platform. - [SparkKubernetesOperator](https://github.com/apache/airflow/blob/d72b363929c86eb03fc9583002459bd10bc7eaeb/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py#L24). Why would anyone use this operator instead of the generic operator for Kubernetes? Hey @mik-laj, I am aware of apache spark and livy operator and also EMR operator. However, EMR on EKS works differently because EMR launches virtual cluster in your EKS. The pods (spark master and executors) launched are ephemeral, only existing when a start job is invoked. For more information, kindly visit https://aws.amazon.com/emr/features/eks/ In addition, SparkKubernetesOperator is only suitable if you have Spark cluster has been setup in Kubernetes. In this case, it is not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
