[ 
https://issues.apache.org/jira/browse/AIRFLOW-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879935#comment-16879935
 ] 

ASF GitHub Bot commented on AIRFLOW-4906:
-----------------------------------------

Fokko commented on pull request #5542: [AIRFLOW-4906] Improve debugging for the 
SparkSubmitHook
URL: https://github.com/apache/airflow/pull/5542
 
 
   Make sure you have checked _all_ steps below.
   
   Before:
   ```
   [2019-07-07 18:03:34,465] {__init__.py:1139} INFO - Dependencies all met for 
<TaskInstance: hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [scheduled]>
   [2019-07-07 18:03:34,486] {__init__.py:1139} INFO - Dependencies all met for 
<TaskInstance: hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [scheduled]>
   [2019-07-07 18:03:34,487] {__init__.py:1353} INFO - 
   
--------------------------------------------------------------------------------
   [2019-07-07 18:03:34,488] {__init__.py:1354} INFO - Starting attempt 2 of 2
   [2019-07-07 18:03:34,488] {__init__.py:1355} INFO - 
   
--------------------------------------------------------------------------------
   [2019-07-07 18:03:34,526] {__init__.py:1374} INFO - Executing 
<Task(SparkSubmitOperator): compute_pi> on 2019-07-04T00:00:00+00:00
   [2019-07-07 18:03:34,530] {base_task_runner.py:119} INFO - Running: 
['airflow', 'run', 'hello_spark', 'compute_pi', '2019-07-04T00:00:00+00:00', 
'--job_id', '36', '--raw', '-sd', 'DAGS_FOLDER/spark.py', '--cfg_path', 
'/tmp/tmpviu_vxcj']
   [2019-07-07 18:03:36,961] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi [2019-07-07 18:03:36,956] {settings.py:182} INFO - 
settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, 
pid=15
   [2019-07-07 18:03:37,520] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi [2019-07-07 18:03:37,508] {__init__.py:51} INFO - Using executor 
LocalExecutor
   [2019-07-07 18:03:38,325] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi [2019-07-07 18:03:38,323] {__init__.py:305} INFO - Filling up the 
DagBag from /root/airflow/dags/spark.py
   [2019-07-07 18:03:39,236] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi [2019-07-07 18:03:39,234] {cli.py:517} INFO - Running <TaskInstance: 
hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [running]> on host 
hellosparkcomputepi-e6e6c0eba6d248848ebbe6d3102b29d1
   [2019-07-07 18:03:39,403] {logging_mixin.py:95} INFO - [2019-07-07 
18:03:39,403] {base_hook.py:83} INFO - Using connection to: id: local-spark. 
Host: k8s://http://192.168.1.113, Port: 8080, Schema: None, Login: None, 
Password: None, extra: {'spark_home': '/usr/spark', 'deploy-mode': 'cluster'}
   [2019-07-07 18:03:39,407] {logging_mixin.py:95} INFO - [2019-07-07 
18:03:39,405] {spark_submit_hook.py:295} INFO - Spark-Submit cmd: 
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark', 
'--deploy-mode', 'cluster', 'local:///code/pi.py']
   [2019-07-07 18:03:45,170] {__init__.py:1580} ERROR - Cannot execute: 
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark', 
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
   Traceback (most recent call last):
     File "/usr/local/lib/python3.5/dist-packages/airflow/models/__init__.py", 
line 1441, in _run_raw_task
       result = task_copy.execute(context=context)
     File 
"/usr/local/lib/python3.5/dist-packages/airflow/contrib/operators/spark_submit_operator.py",
 line 176, in execute
       self._hook.submit(self._application)
     File 
"/usr/local/lib/python3.5/dist-packages/airflow/contrib/hooks/spark_submit_hook.py",
 line 352, in submit
       spark_submit_cmd, returncode
   airflow.exceptions.AirflowException: Cannot execute: ['spark-submit', 
'--master', 'k8s://http://192.168.1.113:8080', '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark', 
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
   [2019-07-07 18:03:45,175] {__init__.py:1611} INFO - Marking task as FAILED.
   [2019-07-07 18:03:45,206] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi /usr/local/lib/python3.5/dist-packages/psycopg2/__init__.py:144: 
UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in 
order to keep installing from binary please use "pip install psycopg2-binary" 
instead. For details see: 
<http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
   [2019-07-07 18:03:45,207] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi   """)
   [2019-07-07 18:03:45,209] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi Traceback (most recent call last):
   [2019-07-07 18:03:45,210] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi   File "/usr/local/bin/airflow", line 32, in <module>
   [2019-07-07 18:03:45,211] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi     args.func(args)
   [2019-07-07 18:03:45,211] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi   File 
"/usr/local/lib/python3.5/dist-packages/airflow/utils/cli.py", line 74, in 
wrapper
   [2019-07-07 18:03:45,212] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi     return f(*args, **kwargs)
   [2019-07-07 18:03:45,213] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", 
line 523, in run
   [2019-07-07 18:03:45,213] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi     _run(args, dag, ti)
   [2019-07-07 18:03:45,214] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", 
line 442, in _run
   [2019-07-07 18:03:45,215] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi     pool=args.pool,
   [2019-07-07 18:03:45,215] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi   File "/usr/local/lib/python3.5/dist-packages/airflow/utils/db.py", 
line 73, in wrapper
   [2019-07-07 18:03:45,216] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi     return func(*args, **kwargs)
   [2019-07-07 18:03:45,217] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi   File 
"/usr/local/lib/python3.5/dist-packages/airflow/models/__init__.py", line 1441, 
in _run_raw_task
   [2019-07-07 18:03:45,217] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi     result = task_copy.execute(context=context)
   [2019-07-07 18:03:45,218] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi   File 
"/usr/local/lib/python3.5/dist-packages/airflow/contrib/operators/spark_submit_operator.py",
 line 176, in execute
   [2019-07-07 18:03:45,219] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi     self._hook.submit(self._application)
   [2019-07-07 18:03:45,220] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi   File 
"/usr/local/lib/python3.5/dist-packages/airflow/contrib/hooks/spark_submit_hook.py",
 line 352, in submit
   [2019-07-07 18:03:45,220] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi     spark_submit_cmd, returncode
   [2019-07-07 18:03:45,221] {base_task_runner.py:101} INFO - Job 36: Subtask 
compute_pi airflow.exceptions.AirflowException: Cannot execute: 
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark', 
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
   [2019-07-07 18:03:49,544] {logging_mixin.py:95} INFO - [2019-07-07 
18:03:49,542] {jobs.py:2562} INFO - Task exited with return code 1
   ```
   
   Becomes:
   ```
   [2019-07-07 19:13:08,038] {__init__.py:1139} INFO - Dependencies all met for 
<TaskInstance: hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [queued]>
   [2019-07-07 19:13:08,102] {__init__.py:1139} INFO - Dependencies all met for 
<TaskInstance: hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [queued]>
   [2019-07-07 19:13:08,105] {__init__.py:1353} INFO - 
   
--------------------------------------------------------------------------------
   [2019-07-07 19:13:08,107] {__init__.py:1354} INFO - Starting attempt 3 of 3
   [2019-07-07 19:13:08,112] {__init__.py:1355} INFO - 
   
--------------------------------------------------------------------------------
   [2019-07-07 19:13:08,209] {__init__.py:1374} INFO - Executing 
<Task(SparkSubmitOperator): compute_pi> on 2019-07-04T00:00:00+00:00
   [2019-07-07 19:13:08,211] {base_task_runner.py:119} INFO - Running: 
['airflow', 'run', 'hello_spark', 'compute_pi', '2019-07-04T00:00:00+00:00', 
'--job_id', '42', '--raw', '-sd', 'DAGS_FOLDER/spark.py', '--cfg_path', 
'/tmp/tmpvrp9os2m']
   [2019-07-07 19:13:10,715] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi [2019-07-07 19:13:10,715] {settings.py:182} INFO - 
settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800, 
pid=15
   [2019-07-07 19:13:10,925] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi [2019-07-07 19:13:10,924] {__init__.py:51} INFO - Using executor 
LocalExecutor
   [2019-07-07 19:13:11,432] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi [2019-07-07 19:13:11,431] {__init__.py:305} INFO - Filling up the 
DagBag from /root/airflow/dags/spark.py
   [2019-07-07 19:13:12,184] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi [2019-07-07 19:13:12,182] {cli.py:517} INFO - Running <TaskInstance: 
hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [running]> on host 
hellosparkcomputepi-477f1a3c92534128a7dc03183888b1a8
   [2019-07-07 19:13:12,283] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:12,283] {base_hook.py:83} INFO - Using connection to: id: local-spark. 
Host: k8s://http://192.168.1.113, Port: 8080, Schema: None, Login: None, 
Password: None, extra: {'spark_home': '/usr/spark', 'deploy-mode': 'cluster'}
   [2019-07-07 19:13:12,286] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:12,284] {spark_submit_hook.py:295} INFO - Spark-Submit cmd: 
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark', 
'--deploy-mode', 'cluster', 'local:///code/pi.py']
   [2019-07-07 19:13:14,983] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:14,983] {spark_submit_hook.py:426} INFO - 19/07/07 19:13:14 WARN Utils: 
Kubernetes master URL uses HTTP instead of HTTPS.
   [2019-07-07 19:13:16,296] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:16,295] {spark_submit_hook.py:426} INFO - log4j:WARN No appenders could 
be found for logger (io.fabric8.kubernetes.client.Config).
   [2019-07-07 19:13:16,297] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:16,297] {spark_submit_hook.py:426} INFO - log4j:WARN Please initialize 
the log4j system properly.
   [2019-07-07 19:13:16,298] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:16,298] {spark_submit_hook.py:426} INFO - log4j:WARN See 
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
   [2019-07-07 19:13:17,114] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,114] {spark_submit_hook.py:426} INFO - Exception in thread "main" 
org.apache.spark.SparkException: Must specify the driver container image
   [2019-07-07 19:13:17,121] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,121] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep$$anonfun$3.apply(BasicDriverFeatureStep.scala:42)
   [2019-07-07 19:13:17,123] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,123] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep$$anonfun$3.apply(BasicDriverFeatureStep.scala:42)
   [2019-07-07 19:13:17,128] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,127] {spark_submit_hook.py:426} INFO - at 
scala.Option.getOrElse(Option.scala:121)
   [2019-07-07 19:13:17,130] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,130] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.<init>(BasicDriverFeatureStep.scala:42)
   [2019-07-07 19:13:17,132] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,131] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder$$anonfun$$lessinit$greater$default$1$1.apply(KubernetesDriverBuilder.scala:25)
   [2019-07-07 19:13:17,135] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,135] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder$$anonfun$$lessinit$greater$default$1$1.apply(KubernetesDriverBuilder.scala:25)
   [2019-07-07 19:13:17,138] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,138] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:59)
   [2019-07-07 19:13:17,143] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,143] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:110)
   [2019-07-07 19:13:17,145] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,144] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
   [2019-07-07 19:13:17,146] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,145] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
   [2019-07-07 19:13:17,148] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,147] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
   [2019-07-07 19:13:17,151] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,151] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
   [2019-07-07 19:13:17,155] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,155] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
   [2019-07-07 19:13:17,156] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,156] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
   [2019-07-07 19:13:17,158] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,157] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
   [2019-07-07 19:13:17,160] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,160] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
   [2019-07-07 19:13:17,161] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,161] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
   [2019-07-07 19:13:17,162] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,161] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
   [2019-07-07 19:13:17,162] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,162] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
   [2019-07-07 19:13:17,163] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:17,163] {spark_submit_hook.py:426} INFO - at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   [2019-07-07 19:13:17,575] {__init__.py:1580} ERROR - Cannot execute: 
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark', 
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
   Traceback (most recent call last):
     File "/usr/local/lib/python3.5/dist-packages/airflow/models/__init__.py", 
line 1441, in _run_raw_task
       result = task_copy.execute(context=context)
     File "/root/airflow/dags/operators/spark_submit_operator.py", line 176, in 
execute
       self._hook.submit(self._application)
     File "/root/airflow/dags/hooks/spark_submit_hook.py", line 352, in submit
       spark_submit_cmd, returncode
   airflow.exceptions.AirflowException: Cannot execute: ['spark-submit', 
'--master', 'k8s://http://192.168.1.113:8080', '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark', 
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
   [2019-07-07 19:13:17,594] {__init__.py:1611} INFO - Marking task as FAILED.
   [2019-07-07 19:13:17,626] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi /usr/local/lib/python3.5/dist-packages/psycopg2/__init__.py:144: 
UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in 
order to keep installing from binary please use "pip install psycopg2-binary" 
instead. For details see: 
<http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
   [2019-07-07 19:13:17,627] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi   """)
   [2019-07-07 19:13:17,640] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi Traceback (most recent call last):
   [2019-07-07 19:13:17,646] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi   File "/usr/local/bin/airflow", line 32, in <module>
   [2019-07-07 19:13:17,648] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi     args.func(args)
   [2019-07-07 19:13:17,649] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi   File 
"/usr/local/lib/python3.5/dist-packages/airflow/utils/cli.py", line 74, in 
wrapper
   [2019-07-07 19:13:17,649] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi     return f(*args, **kwargs)
   [2019-07-07 19:13:17,651] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", 
line 523, in run
   [2019-07-07 19:13:17,652] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi     _run(args, dag, ti)
   [2019-07-07 19:13:17,652] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi   File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", 
line 442, in _run
   [2019-07-07 19:13:17,653] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi     pool=args.pool,
   [2019-07-07 19:13:17,653] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi   File "/usr/local/lib/python3.5/dist-packages/airflow/utils/db.py", 
line 73, in wrapper
   [2019-07-07 19:13:17,656] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi     return func(*args, **kwargs)
   [2019-07-07 19:13:17,657] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi   File 
"/usr/local/lib/python3.5/dist-packages/airflow/models/__init__.py", line 1441, 
in _run_raw_task
   [2019-07-07 19:13:17,662] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi     result = task_copy.execute(context=context)
   [2019-07-07 19:13:17,663] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi   File "/root/airflow/dags/operators/spark_submit_operator.py", line 
176, in execute
   [2019-07-07 19:13:17,664] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi     self._hook.submit(self._application)
   [2019-07-07 19:13:17,665] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi   File "/root/airflow/dags/hooks/spark_submit_hook.py", line 352, in 
submit
   [2019-07-07 19:13:17,668] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi     spark_submit_cmd, returncode
   [2019-07-07 19:13:17,669] {base_task_runner.py:101} INFO - Job 42: Subtask 
compute_pi airflow.exceptions.AirflowException: Cannot execute: 
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
 '--conf', 
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
 '--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark', 
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
   [2019-07-07 19:13:18,137] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:18,137] {jobs.py:2630} WARNING - State of this instance has been 
externally set to failed. Taking the poison pill.
   [2019-07-07 19:13:18,157] {helpers.py:281} INFO - Sending Signals.SIGTERM to 
GPID 15
   [2019-07-07 19:13:18,238] {helpers.py:263} INFO - Process 
psutil.Process(pid=15, status='terminated') (15) terminated with exit code -15
   [2019-07-07 19:13:18,270] {logging_mixin.py:95} INFO - [2019-07-07 
19:13:18,259] {jobs.py:2562} INFO - Task exited with return code 0
   ```
   
   Otherwise, it would become very hard to find errors in the configuration (in 
the image in this case).
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-4906\] My Airflow PR"
     - https://issues.apache.org/jira/browse/AIRFLOW-4906
     - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-4906\], code changes always need a Jira issue.
     - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
     - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
     - All the public functions and the classes in the PR contain docstrings 
that explain what it does
     - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Improve debugging for the SparkSubmitHook
> -----------------------------------------
>
>                 Key: AIRFLOW-4906
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4906
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks
>    Affects Versions: 1.10.3
>            Reporter: Fokko Driesprong
>            Assignee: Fokko Driesprong
>            Priority: Major
>             Fix For: 2.0.0
>
>
> Currently, the output of the spark-submit command is not being sent to the 
> logs. This makes debugging of the k8s jobs rather hard. For example, if you 
> make a typo, you only will get the exit code which is non-descriptive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to