georgehu0815 opened a new issue, #29813:
URL: https://github.com/apache/airflow/issues/29813
### Apache Airflow version
2.5.1
### What happened
Exrra: {"master": "local[2]", "namespace": "default", "deploy-mode":
"client", "spark-binary": "spark-submit"}
IRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=example_spark_operator
AIRFLOW_CTX_TASK_ID=submit_job1
AIRFLOW_CTX_EXECUTION_DATE=2023-02-28T16:15:18.549640+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-02-28T16:15:18.549640+00:00
[2023-02-28, 16:15:19 UTC] {base.py:68} INFO - Using connection ID
'spark_local' for task execution.
[2023-02-28, 16:15:19 UTC] {spark_submit.py:344} INFO - Spark-Submit cmd:
spark-submit --master local[1] --conf spark.driver.maxResultSize=1g --name
arrow-spark --deploy-mode client ${SPARK_HOME}/examples/src/main/python/pi.py
**[2023-02-28, 16:15:19 UTC] {spark_submit.py:495} INFO - JAVA_HOME is not
set**
[2023-02-28, 16:15:19 UTC] {taskinstance.py:1889} ERROR - Task failed with
exception
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/apache/spark/operators/spark_submit.py",
line 157, in execute
self._hook.submit(self._application)
File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/apache/spark/hooks/spark_submit.py",
line 427, in submit
f"Cannot execute: {self._mask_cmd(spark_submit_cmd)}. Error code is:
{returncode}."
airflow.exceptions.AirflowException: Cannot execute: spark-submit --master
local[1] --conf spark.driver.maxResultSize=1g --name arrow-spark --deploy-mode
client ${SPARK_HOME}/examples/src/main/python/pi.py. Error code is: 1.
[2023-02-28, 16:15:19 UTC] {taskinstance.py:1400} INFO - Marking task as
FAILED. dag_id=example_spark_operator, task_id=submit_job1,
execution_date=20230228T161518, start_date=20230228T161519,
end_date=20230228T161519
[2023-02-28, 16:15:19 UTC] {standard_task_runner.py:97} ERROR - Failed to
execute job 9 for task submit_job1 (Cannot execute: spark-submit --master
local[1] --conf spark.driver.maxResultSize=1g --name arrow-spark --deploy-mode
client ${SPARK_HOME}/examples/src/main/python/pi.py. Error code is: 1.; 1125)
[2023-02-28, 16:15:19 UTC] {local_task_job.py:156} INFO - Task exited with
return code 1
[2023-02-28, 16:15:19 UTC] {local_task_job.py:273} INFO - 0 downstream tasks
scheduled from follow-on schedule check
### What you think should happen instead
follow command runs success in the airflow-scheduler docker environent
spark-submit --master local[1] --conf spark.driver.maxResultSize=1g --name
arrow-spark --deploy-mode client ${SPARK_HOME}/examples/src/main/python/pi.py.
It should run success from airflow-UI
### How to reproduce
setup spark_local
connectionid: spark_local
type: spark
Host: local[1]
{"master": "local[2]", "namespace": "default", "deploy-mode": "client",
"spark-binary": "spark-submit"}
run the following dag
DAG_ID = "example_spark_operator"
# Spark-Submit cmd: spark-submit --master local --name arrow-spark
${SPARK_HOME}/examples/src/main/python/pi.py
with DAG(
dag_id=DAG_ID,
start_date=datetime(2023, 2, 26),
catchup=False,
params={
"master":"local",
# Master can be local, yarn, spark://HOST:PORT,
mesos://HOST:PORT and
# k8s://https://<HOST>:<PORT>
"deploy-mode": "client"
}
) as dag:
submit_job = SparkSubmitOperator(
application="${SPARK_HOME}/examples/src/main/python/pi.py",
conn_id='spark_local',
task_id="submit_job1",
dag=dag,
# env_vars={'JAVA_HOME':
'/opt/airflow/jdk-11','SPARK_HOME':'/opt/airflow/jdk-11'},
conf = {
# "spark.yarn.appMasterEnv.JAVA_HOME":"/opt/airflow/jdk-11",
"spark.driver.maxResultSize": "1g"
}
)
submit_job
### Operating System
windows 11 with Linux subsystem
### Versions of Apache Airflow Providers
2.3.0
### Deployment
Docker-Compose
### Deployment details
---
version: '3.4'
x-common:
&common
image: apache/airflow:2.3.0
user: "${AIRFLOW_UID}:0"
environment:
- JAVA_HOME=/opt/airflow/jdk-11
env_file:
- .env
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- /var/run/docker.sock:/var/run/docker.sock
x-depends-on:
&depends-on
depends_on:
postgres:
condition: service_healthy
airflow-init:
condition: service_completed_successfully
services:
postgres:
image: postgres:13
container_name: postgres
ports:
- "5434:5432"
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
env_file:
- .env
scheduler:
<<: *common
<<: *depends-on
container_name: airflow-scheduler
command: scheduler
restart: on-failure
ports:
- "8793:8793"
- "4040:4040"
webserver:
<<: *common
<<: *depends-on
container_name: airflow-webserver
restart: always
command: webserver
ports:
- "8080:8080"
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 30s
timeout: 30s
retries: 5
airflow-init:
<<: *common
container_name: airflow-init
entrypoint: /bin/bash
command:
- -c
- |
mkdir -p /sources/logs1 /sources/dags1 /sources/plugins1
chown -R "${AIRFLOW_UID}:0" /sources/{logs1,dags1,plugins1}
exec /entrypoint airflow version
### Anything else
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]