georgehu0815 opened a new issue, #29813:
URL: https://github.com/apache/airflow/issues/29813

   ### Apache Airflow version
   
   2.5.1
   
   ### What happened
   
   Exrra: {"master": "local[2]", "namespace": "default", "deploy-mode": 
"client", "spark-binary": "spark-submit"}
   
   IRFLOW_CTX_DAG_OWNER=***
   AIRFLOW_CTX_DAG_ID=example_spark_operator
   AIRFLOW_CTX_TASK_ID=submit_job1
   AIRFLOW_CTX_EXECUTION_DATE=2023-02-28T16:15:18.549640+00:00
   AIRFLOW_CTX_TRY_NUMBER=1
   AIRFLOW_CTX_DAG_RUN_ID=manual__2023-02-28T16:15:18.549640+00:00
   [2023-02-28, 16:15:19 UTC] {base.py:68} INFO - Using connection ID 
'spark_local' for task execution.
   [2023-02-28, 16:15:19 UTC] {spark_submit.py:344} INFO - Spark-Submit cmd: 
spark-submit --master local[1] --conf spark.driver.maxResultSize=1g --name 
arrow-spark --deploy-mode client ${SPARK_HOME}/examples/src/main/python/pi.py
   **[2023-02-28, 16:15:19 UTC] {spark_submit.py:495} INFO - JAVA_HOME is not 
set**
   [2023-02-28, 16:15:19 UTC] {taskinstance.py:1889} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/apache/spark/operators/spark_submit.py",
 line 157, in execute
       self._hook.submit(self._application)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/apache/spark/hooks/spark_submit.py",
 line 427, in submit
       f"Cannot execute: {self._mask_cmd(spark_submit_cmd)}. Error code is: 
{returncode}."
   airflow.exceptions.AirflowException: Cannot execute: spark-submit --master 
local[1] --conf spark.driver.maxResultSize=1g --name arrow-spark --deploy-mode 
client ${SPARK_HOME}/examples/src/main/python/pi.py. Error code is: 1.
   [2023-02-28, 16:15:19 UTC] {taskinstance.py:1400} INFO - Marking task as 
FAILED. dag_id=example_spark_operator, task_id=submit_job1, 
execution_date=20230228T161518, start_date=20230228T161519, 
end_date=20230228T161519
   [2023-02-28, 16:15:19 UTC] {standard_task_runner.py:97} ERROR - Failed to 
execute job 9 for task submit_job1 (Cannot execute: spark-submit --master 
local[1] --conf spark.driver.maxResultSize=1g --name arrow-spark --deploy-mode 
client ${SPARK_HOME}/examples/src/main/python/pi.py. Error code is: 1.; 1125)
   [2023-02-28, 16:15:19 UTC] {local_task_job.py:156} INFO - Task exited with 
return code 1
   [2023-02-28, 16:15:19 UTC] {local_task_job.py:273} INFO - 0 downstream tasks 
scheduled from follow-on schedule check
   
   
   
   
   ### What you think should happen instead
   
   follow command runs success in the airflow-scheduler docker environent
   spark-submit --master local[1] --conf spark.driver.maxResultSize=1g --name 
arrow-spark --deploy-mode client ${SPARK_HOME}/examples/src/main/python/pi.py.
   
   It should run success from airflow-UI
   
   
   ### How to reproduce
   
   setup spark_local 
   
   connectionid: spark_local
   type: spark
   Host: local[1]
   {"master": "local[2]", "namespace": "default", "deploy-mode": "client", 
"spark-binary": "spark-submit"}
   
   
   run the following dag
   
   
   DAG_ID = "example_spark_operator"
   # Spark-Submit cmd: spark-submit --master local --name arrow-spark 
${SPARK_HOME}/examples/src/main/python/pi.py
   with DAG(
       dag_id=DAG_ID,
       start_date=datetime(2023, 2, 26),
       catchup=False,
       params={
           "master":"local",
               # Master can be local, yarn, spark://HOST:PORT, 
mesos://HOST:PORT and
               # k8s://https://<HOST>:<PORT>
           "deploy-mode": "client"
       }
   
   ) as dag:
       submit_job = SparkSubmitOperator(
           application="${SPARK_HOME}/examples/src/main/python/pi.py", 
           conn_id='spark_local',
           task_id="submit_job1",
           dag=dag,
           #  env_vars={'JAVA_HOME': 
'/opt/airflow/jdk-11','SPARK_HOME':'/opt/airflow/jdk-11'},
           conf = {
               # "spark.yarn.appMasterEnv.JAVA_HOME":"/opt/airflow/jdk-11",
   
               "spark.driver.maxResultSize": "1g"
           } 
   
       )
       submit_job
   
   
   ### Operating System
   
   windows 11 with Linux subsystem
   
   ### Versions of Apache Airflow Providers
   
   2.3.0
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   ---
   version: '3.4'
   
   x-common:
     &common
     image: apache/airflow:2.3.0
   
     user: "${AIRFLOW_UID}:0"
     environment:
         - JAVA_HOME=/opt/airflow/jdk-11
     env_file: 
       - .env
     volumes:
       - ./dags:/opt/airflow/dags
       - ./logs:/opt/airflow/logs
       - ./plugins:/opt/airflow/plugins
       - /var/run/docker.sock:/var/run/docker.sock
   
   x-depends-on:
     &depends-on
     depends_on:
       postgres:
         condition: service_healthy
       airflow-init:
         condition: service_completed_successfully
   
   services:
     postgres:
       image: postgres:13
       container_name: postgres
       ports:
         - "5434:5432"
       healthcheck:
         test: ["CMD", "pg_isready", "-U", "airflow"]
         interval: 5s
         retries: 5
       env_file:
         - .env
   
     scheduler:
       <<: *common
       <<: *depends-on
       container_name: airflow-scheduler
       command: scheduler
       restart: on-failure
       ports:
         - "8793:8793"
         - "4040:4040"
   
     webserver:
       <<: *common
       <<: *depends-on
       container_name: airflow-webserver
       restart: always
       command: webserver
       ports:
         - "8080:8080"
       healthcheck:
         test: ["CMD", "curl", "--fail", "http://localhost:8080/health";]
         interval: 30s
         timeout: 30s
         retries: 5
     
     airflow-init:
       <<: *common
       container_name: airflow-init
       entrypoint: /bin/bash
       command:
         - -c
         - |
           mkdir -p /sources/logs1 /sources/dags1 /sources/plugins1
           chown -R "${AIRFLOW_UID}:0" /sources/{logs1,dags1,plugins1}
           exec /entrypoint airflow version
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to