nailo2c commented on PR #65991:
URL: https://github.com/apache/airflow/pull/65991#issuecomment-4547672720
Hi @amoghrajesh, I tested the Kerberos path, and it works as expected.
1. I used a Kerberos-enabled GCP Dataproc cluster and a GCE VM as my test
environment.
1. Dataproc
<img width="1011" height="366" alt="dataproc"
src="https://github.com/user-attachments/assets/c2823fe2-5b39-4c24-b340-2507fba7315d"
/>
2. GCE (for Airflow/Breeze and the Dataproc master node)
<img width="1423" height="404" alt="gce"
src="https://github.com/user-attachments/assets/558bc123-3379-4cf0-b700-918afffbdd69"
/>
2. Test Dags
<details>
<summary>With RM REST auth</summary>
```python
from datetime import datetime
from requests_kerberos import HTTPKerberosAuth, REQUIRED
from airflow.models import DAG
from airflow.providers.apache.spark.operators.spark_submit import
SparkSubmitOperator
with DAG(
dag_id="spark_yarn_repro_24171_kerberos_rm",
schedule=None,
start_date=datetime(2026, 1, 1),
catchup=False,
tags=["repro", "issue-24171", "kerberos", "rm-rest"],
):
SparkSubmitOperator(
task_id="spark_pi_yarn_cluster_kerberos_rm",
application="/opt/airflow/dev/.issue-24171/spark/examples/jars/spark-examples_2.12-3.5.3.jar",
java_class="org.apache.spark.examples.SparkPi",
application_args=["200"],
conn_id="spark_yarn_kerberos_rm",
deploy_mode="cluster",
files="/opt/airflow/dev/.issue-24171/gcp-kerberos/krb5.conf",
name="airflow-pi-cluster-kerberos-rm",
conf={
"spark.executor.instances": "1",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.driver.extraJavaOptions":
"-Djava.security.krb5.conf=krb5.conf",
"spark.executor.extraJavaOptions":
"-Djava.security.krb5.conf=krb5.conf",
},
keytab="/opt/airflow/dev/.issue-24171/gcp-kerberos/airflow.keytab",
principal="airflow",
use_krb5ccache=True,
yarn_track_via_rm_api=True,
yarn_rm_auth=HTTPKerberosAuth(mutual_authentication=REQUIRED),
status_poll_interval=5,
verbose=True,
)
```
</details>
<details>
<summary>Without RM REST auth</summary>
```python
from datetime import datetime
from airflow.models import DAG
from airflow.providers.apache.spark.operators.spark_submit import
SparkSubmitOperator
with DAG(
dag_id="spark_yarn_repro_24171_kerberos_rm_noauth",
schedule=None,
start_date=datetime(2026, 1, 1),
catchup=False,
tags=["repro", "issue-24171", "kerberos", "rm-rest", "negative"],
):
SparkSubmitOperator(
task_id="spark_pi_yarn_cluster_kerberos_rm_noauth",
application="/opt/airflow/dev/.issue-24171/spark/examples/jars/spark-examples_2.12-3.5.3.jar",
java_class="org.apache.spark.examples.SparkPi",
application_args=["200"],
conn_id="spark_yarn_kerberos_rm",
deploy_mode="cluster",
files="/opt/airflow/dev/.issue-24171/gcp-kerberos/krb5.conf",
name="airflow-pi-cluster-kerberos-rm-noauth",
conf={
"spark.executor.instances": "1",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.driver.extraJavaOptions":
"-Djava.security.krb5.conf=krb5.conf",
"spark.executor.extraJavaOptions":
"-Djava.security.krb5.conf=krb5.conf",
},
keytab="/opt/airflow/dev/.issue-24171/gcp-kerberos/airflow.keytab",
principal="airflow",
use_krb5ccache=True,
yarn_track_via_rm_api=True,
yarn_rm_auth=None,
status_poll_interval=1,
verbose=True,
)
```
</details>
3. Results
1. The Dag with `yarn_rm_auth` succeeded, and the Dag without
`yarn_rm_auth` failed while polling the YARN RM REST API with HTTP 401, as
expected.
<img width="1427" height="599" alt="airflow_ui"
src="https://github.com/user-attachments/assets/e8dc769e-7017-4135-b864-60d95579ce9c"
/>
<img width="1454" height="933" alt="401_auth_error"
src="https://github.com/user-attachments/assets/050a8b16-54f5-467f-ad42-e93a18ce0a05"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]