nailo2c commented on PR #65991:
URL: https://github.com/apache/airflow/pull/65991#issuecomment-4547672720

   Hi @amoghrajesh, I tested the Kerberos path, and it works as expected.
   
   1. I used a Kerberos-enabled GCP Dataproc cluster and a GCE VM as my test 
environment.
       1. Dataproc
           <img width="1011" height="366" alt="dataproc" 
src="https://github.com/user-attachments/assets/c2823fe2-5b39-4c24-b340-2507fba7315d";
 />
       2. GCE (for Airflow/Breeze and the Dataproc master node)
           <img width="1423" height="404" alt="gce" 
src="https://github.com/user-attachments/assets/558bc123-3379-4cf0-b700-918afffbdd69";
 />
   2. Test Dags
   
       <details>
       <summary>With RM REST auth</summary>
   
       ```python
       from datetime import datetime
   
       from requests_kerberos import HTTPKerberosAuth, REQUIRED
   
       from airflow.models import DAG
       from airflow.providers.apache.spark.operators.spark_submit import 
SparkSubmitOperator
   
       with DAG(
           dag_id="spark_yarn_repro_24171_kerberos_rm",
           schedule=None,
           start_date=datetime(2026, 1, 1),
           catchup=False,
           tags=["repro", "issue-24171", "kerberos", "rm-rest"],
       ):
           SparkSubmitOperator(
               task_id="spark_pi_yarn_cluster_kerberos_rm",
               
application="/opt/airflow/dev/.issue-24171/spark/examples/jars/spark-examples_2.12-3.5.3.jar",
               java_class="org.apache.spark.examples.SparkPi",
               application_args=["200"],
               conn_id="spark_yarn_kerberos_rm",
               deploy_mode="cluster",
               files="/opt/airflow/dev/.issue-24171/gcp-kerberos/krb5.conf",
               name="airflow-pi-cluster-kerberos-rm",
               conf={
                   "spark.executor.instances": "1",
                   "spark.executor.memory": "512m",
                   "spark.driver.memory": "512m",
                   "spark.driver.extraJavaOptions": 
"-Djava.security.krb5.conf=krb5.conf",
                   "spark.executor.extraJavaOptions": 
"-Djava.security.krb5.conf=krb5.conf",
               },
               
keytab="/opt/airflow/dev/.issue-24171/gcp-kerberos/airflow.keytab",
               principal="airflow",
               use_krb5ccache=True,
               yarn_track_via_rm_api=True,
               yarn_rm_auth=HTTPKerberosAuth(mutual_authentication=REQUIRED),
               status_poll_interval=5,
               verbose=True,
           )
       ```
   
       </details>
   
       <details>
       <summary>Without RM REST auth</summary>
   
       ```python
       from datetime import datetime
   
       from airflow.models import DAG
       from airflow.providers.apache.spark.operators.spark_submit import 
SparkSubmitOperator
   
       with DAG(
           dag_id="spark_yarn_repro_24171_kerberos_rm_noauth",
           schedule=None,
           start_date=datetime(2026, 1, 1),
           catchup=False,
           tags=["repro", "issue-24171", "kerberos", "rm-rest", "negative"],
       ):
           SparkSubmitOperator(
               task_id="spark_pi_yarn_cluster_kerberos_rm_noauth",
               
application="/opt/airflow/dev/.issue-24171/spark/examples/jars/spark-examples_2.12-3.5.3.jar",
               java_class="org.apache.spark.examples.SparkPi",
               application_args=["200"],
               conn_id="spark_yarn_kerberos_rm",
               deploy_mode="cluster",
               files="/opt/airflow/dev/.issue-24171/gcp-kerberos/krb5.conf",
               name="airflow-pi-cluster-kerberos-rm-noauth",
               conf={
                   "spark.executor.instances": "1",
                   "spark.executor.memory": "512m",
                   "spark.driver.memory": "512m",
                   "spark.driver.extraJavaOptions": 
"-Djava.security.krb5.conf=krb5.conf",
                   "spark.executor.extraJavaOptions": 
"-Djava.security.krb5.conf=krb5.conf",
               },
               
keytab="/opt/airflow/dev/.issue-24171/gcp-kerberos/airflow.keytab",
               principal="airflow",
               use_krb5ccache=True,
               yarn_track_via_rm_api=True,
               yarn_rm_auth=None,
               status_poll_interval=1,
               verbose=True,
           )
       ```
   
       </details>
   
   3. Results
       1. The Dag with `yarn_rm_auth` succeeded, and the Dag without 
`yarn_rm_auth` failed while polling the YARN RM REST API with HTTP 401, as 
expected.
           <img width="1427" height="599" alt="airflow_ui" 
src="https://github.com/user-attachments/assets/e8dc769e-7017-4135-b864-60d95579ce9c";
 />
           <img width="1454" height="933" alt="401_auth_error" 
src="https://github.com/user-attachments/assets/050a8b16-54f5-467f-ad42-e93a18ce0a05";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to