[
https://issues.apache.org/jira/browse/AIRFLOW-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879935#comment-16879935
]
ASF GitHub Bot commented on AIRFLOW-4906:
-----------------------------------------
Fokko commented on pull request #5542: [AIRFLOW-4906] Improve debugging for the
SparkSubmitHook
URL: https://github.com/apache/airflow/pull/5542
Make sure you have checked _all_ steps below.
Before:
```
[2019-07-07 18:03:34,465] {__init__.py:1139} INFO - Dependencies all met for
<TaskInstance: hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [scheduled]>
[2019-07-07 18:03:34,486] {__init__.py:1139} INFO - Dependencies all met for
<TaskInstance: hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [scheduled]>
[2019-07-07 18:03:34,487] {__init__.py:1353} INFO -
--------------------------------------------------------------------------------
[2019-07-07 18:03:34,488] {__init__.py:1354} INFO - Starting attempt 2 of 2
[2019-07-07 18:03:34,488] {__init__.py:1355} INFO -
--------------------------------------------------------------------------------
[2019-07-07 18:03:34,526] {__init__.py:1374} INFO - Executing
<Task(SparkSubmitOperator): compute_pi> on 2019-07-04T00:00:00+00:00
[2019-07-07 18:03:34,530] {base_task_runner.py:119} INFO - Running:
['airflow', 'run', 'hello_spark', 'compute_pi', '2019-07-04T00:00:00+00:00',
'--job_id', '36', '--raw', '-sd', 'DAGS_FOLDER/spark.py', '--cfg_path',
'/tmp/tmpviu_vxcj']
[2019-07-07 18:03:36,961] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi [2019-07-07 18:03:36,956] {settings.py:182} INFO -
settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800,
pid=15
[2019-07-07 18:03:37,520] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi [2019-07-07 18:03:37,508] {__init__.py:51} INFO - Using executor
LocalExecutor
[2019-07-07 18:03:38,325] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi [2019-07-07 18:03:38,323] {__init__.py:305} INFO - Filling up the
DagBag from /root/airflow/dags/spark.py
[2019-07-07 18:03:39,236] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi [2019-07-07 18:03:39,234] {cli.py:517} INFO - Running <TaskInstance:
hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [running]> on host
hellosparkcomputepi-e6e6c0eba6d248848ebbe6d3102b29d1
[2019-07-07 18:03:39,403] {logging_mixin.py:95} INFO - [2019-07-07
18:03:39,403] {base_hook.py:83} INFO - Using connection to: id: local-spark.
Host: k8s://http://192.168.1.113, Port: 8080, Schema: None, Login: None,
Password: None, extra: {'spark_home': '/usr/spark', 'deploy-mode': 'cluster'}
[2019-07-07 18:03:39,407] {logging_mixin.py:95} INFO - [2019-07-07
18:03:39,405] {spark_submit_hook.py:295} INFO - Spark-Submit cmd:
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark',
'--deploy-mode', 'cluster', 'local:///code/pi.py']
[2019-07-07 18:03:45,170] {__init__.py:1580} ERROR - Cannot execute:
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark',
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/airflow/models/__init__.py",
line 1441, in _run_raw_task
result = task_copy.execute(context=context)
File
"/usr/local/lib/python3.5/dist-packages/airflow/contrib/operators/spark_submit_operator.py",
line 176, in execute
self._hook.submit(self._application)
File
"/usr/local/lib/python3.5/dist-packages/airflow/contrib/hooks/spark_submit_hook.py",
line 352, in submit
spark_submit_cmd, returncode
airflow.exceptions.AirflowException: Cannot execute: ['spark-submit',
'--master', 'k8s://http://192.168.1.113:8080', '--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark',
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
[2019-07-07 18:03:45,175] {__init__.py:1611} INFO - Marking task as FAILED.
[2019-07-07 18:03:45,206] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi /usr/local/lib/python3.5/dist-packages/psycopg2/__init__.py:144:
UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in
order to keep installing from binary please use "pip install psycopg2-binary"
instead. For details see:
<http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
[2019-07-07 18:03:45,207] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi """)
[2019-07-07 18:03:45,209] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi Traceback (most recent call last):
[2019-07-07 18:03:45,210] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi File "/usr/local/bin/airflow", line 32, in <module>
[2019-07-07 18:03:45,211] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi args.func(args)
[2019-07-07 18:03:45,211] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi File
"/usr/local/lib/python3.5/dist-packages/airflow/utils/cli.py", line 74, in
wrapper
[2019-07-07 18:03:45,212] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi return f(*args, **kwargs)
[2019-07-07 18:03:45,213] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py",
line 523, in run
[2019-07-07 18:03:45,213] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi _run(args, dag, ti)
[2019-07-07 18:03:45,214] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py",
line 442, in _run
[2019-07-07 18:03:45,215] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi pool=args.pool,
[2019-07-07 18:03:45,215] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi File "/usr/local/lib/python3.5/dist-packages/airflow/utils/db.py",
line 73, in wrapper
[2019-07-07 18:03:45,216] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi return func(*args, **kwargs)
[2019-07-07 18:03:45,217] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi File
"/usr/local/lib/python3.5/dist-packages/airflow/models/__init__.py", line 1441,
in _run_raw_task
[2019-07-07 18:03:45,217] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi result = task_copy.execute(context=context)
[2019-07-07 18:03:45,218] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi File
"/usr/local/lib/python3.5/dist-packages/airflow/contrib/operators/spark_submit_operator.py",
line 176, in execute
[2019-07-07 18:03:45,219] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi self._hook.submit(self._application)
[2019-07-07 18:03:45,220] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi File
"/usr/local/lib/python3.5/dist-packages/airflow/contrib/hooks/spark_submit_hook.py",
line 352, in submit
[2019-07-07 18:03:45,220] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi spark_submit_cmd, returncode
[2019-07-07 18:03:45,221] {base_task_runner.py:101} INFO - Job 36: Subtask
compute_pi airflow.exceptions.AirflowException: Cannot execute:
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark',
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
[2019-07-07 18:03:49,544] {logging_mixin.py:95} INFO - [2019-07-07
18:03:49,542] {jobs.py:2562} INFO - Task exited with return code 1
```
Becomes:
```
[2019-07-07 19:13:08,038] {__init__.py:1139} INFO - Dependencies all met for
<TaskInstance: hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [queued]>
[2019-07-07 19:13:08,102] {__init__.py:1139} INFO - Dependencies all met for
<TaskInstance: hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [queued]>
[2019-07-07 19:13:08,105] {__init__.py:1353} INFO -
--------------------------------------------------------------------------------
[2019-07-07 19:13:08,107] {__init__.py:1354} INFO - Starting attempt 3 of 3
[2019-07-07 19:13:08,112] {__init__.py:1355} INFO -
--------------------------------------------------------------------------------
[2019-07-07 19:13:08,209] {__init__.py:1374} INFO - Executing
<Task(SparkSubmitOperator): compute_pi> on 2019-07-04T00:00:00+00:00
[2019-07-07 19:13:08,211] {base_task_runner.py:119} INFO - Running:
['airflow', 'run', 'hello_spark', 'compute_pi', '2019-07-04T00:00:00+00:00',
'--job_id', '42', '--raw', '-sd', 'DAGS_FOLDER/spark.py', '--cfg_path',
'/tmp/tmpvrp9os2m']
[2019-07-07 19:13:10,715] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi [2019-07-07 19:13:10,715] {settings.py:182} INFO -
settings.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800,
pid=15
[2019-07-07 19:13:10,925] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi [2019-07-07 19:13:10,924] {__init__.py:51} INFO - Using executor
LocalExecutor
[2019-07-07 19:13:11,432] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi [2019-07-07 19:13:11,431] {__init__.py:305} INFO - Filling up the
DagBag from /root/airflow/dags/spark.py
[2019-07-07 19:13:12,184] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi [2019-07-07 19:13:12,182] {cli.py:517} INFO - Running <TaskInstance:
hello_spark.compute_pi 2019-07-04T00:00:00+00:00 [running]> on host
hellosparkcomputepi-477f1a3c92534128a7dc03183888b1a8
[2019-07-07 19:13:12,283] {logging_mixin.py:95} INFO - [2019-07-07
19:13:12,283] {base_hook.py:83} INFO - Using connection to: id: local-spark.
Host: k8s://http://192.168.1.113, Port: 8080, Schema: None, Login: None,
Password: None, extra: {'spark_home': '/usr/spark', 'deploy-mode': 'cluster'}
[2019-07-07 19:13:12,286] {logging_mixin.py:95} INFO - [2019-07-07
19:13:12,284] {spark_submit_hook.py:295} INFO - Spark-Submit cmd:
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark',
'--deploy-mode', 'cluster', 'local:///code/pi.py']
[2019-07-07 19:13:14,983] {logging_mixin.py:95} INFO - [2019-07-07
19:13:14,983] {spark_submit_hook.py:426} INFO - 19/07/07 19:13:14 WARN Utils:
Kubernetes master URL uses HTTP instead of HTTPS.
[2019-07-07 19:13:16,296] {logging_mixin.py:95} INFO - [2019-07-07
19:13:16,295] {spark_submit_hook.py:426} INFO - log4j:WARN No appenders could
be found for logger (io.fabric8.kubernetes.client.Config).
[2019-07-07 19:13:16,297] {logging_mixin.py:95} INFO - [2019-07-07
19:13:16,297] {spark_submit_hook.py:426} INFO - log4j:WARN Please initialize
the log4j system properly.
[2019-07-07 19:13:16,298] {logging_mixin.py:95} INFO - [2019-07-07
19:13:16,298] {spark_submit_hook.py:426} INFO - log4j:WARN See
http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2019-07-07 19:13:17,114] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,114] {spark_submit_hook.py:426} INFO - Exception in thread "main"
org.apache.spark.SparkException: Must specify the driver container image
[2019-07-07 19:13:17,121] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,121] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep$$anonfun$3.apply(BasicDriverFeatureStep.scala:42)
[2019-07-07 19:13:17,123] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,123] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep$$anonfun$3.apply(BasicDriverFeatureStep.scala:42)
[2019-07-07 19:13:17,128] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,127] {spark_submit_hook.py:426} INFO - at
scala.Option.getOrElse(Option.scala:121)
[2019-07-07 19:13:17,130] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,130] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.<init>(BasicDriverFeatureStep.scala:42)
[2019-07-07 19:13:17,132] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,131] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder$$anonfun$$lessinit$greater$default$1$1.apply(KubernetesDriverBuilder.scala:25)
[2019-07-07 19:13:17,135] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,135] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder$$anonfun$$lessinit$greater$default$1$1.apply(KubernetesDriverBuilder.scala:25)
[2019-07-07 19:13:17,138] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,138] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:59)
[2019-07-07 19:13:17,143] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,143] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:110)
[2019-07-07 19:13:17,145] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,144] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
[2019-07-07 19:13:17,146] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,145] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
[2019-07-07 19:13:17,148] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,147] {spark_submit_hook.py:426} INFO - at
org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
[2019-07-07 19:13:17,151] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,151] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
[2019-07-07 19:13:17,155] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,155] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
[2019-07-07 19:13:17,156] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,156] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
[2019-07-07 19:13:17,158] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,157] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
[2019-07-07 19:13:17,160] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,160] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
[2019-07-07 19:13:17,161] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,161] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
[2019-07-07 19:13:17,162] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,161] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
[2019-07-07 19:13:17,162] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,162] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
[2019-07-07 19:13:17,163] {logging_mixin.py:95} INFO - [2019-07-07
19:13:17,163] {spark_submit_hook.py:426} INFO - at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[2019-07-07 19:13:17,575] {__init__.py:1580} ERROR - Cannot execute:
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark',
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/airflow/models/__init__.py",
line 1441, in _run_raw_task
result = task_copy.execute(context=context)
File "/root/airflow/dags/operators/spark_submit_operator.py", line 176, in
execute
self._hook.submit(self._application)
File "/root/airflow/dags/hooks/spark_submit_hook.py", line 352, in submit
spark_submit_cmd, returncode
airflow.exceptions.AirflowException: Cannot execute: ['spark-submit',
'--master', 'k8s://http://192.168.1.113:8080', '--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark',
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
[2019-07-07 19:13:17,594] {__init__.py:1611} INFO - Marking task as FAILED.
[2019-07-07 19:13:17,626] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi /usr/local/lib/python3.5/dist-packages/psycopg2/__init__.py:144:
UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in
order to keep installing from binary please use "pip install psycopg2-binary"
instead. For details see:
<http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
[2019-07-07 19:13:17,627] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi """)
[2019-07-07 19:13:17,640] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi Traceback (most recent call last):
[2019-07-07 19:13:17,646] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi File "/usr/local/bin/airflow", line 32, in <module>
[2019-07-07 19:13:17,648] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi args.func(args)
[2019-07-07 19:13:17,649] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi File
"/usr/local/lib/python3.5/dist-packages/airflow/utils/cli.py", line 74, in
wrapper
[2019-07-07 19:13:17,649] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi return f(*args, **kwargs)
[2019-07-07 19:13:17,651] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py",
line 523, in run
[2019-07-07 19:13:17,652] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi _run(args, dag, ti)
[2019-07-07 19:13:17,652] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py",
line 442, in _run
[2019-07-07 19:13:17,653] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi pool=args.pool,
[2019-07-07 19:13:17,653] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi File "/usr/local/lib/python3.5/dist-packages/airflow/utils/db.py",
line 73, in wrapper
[2019-07-07 19:13:17,656] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi return func(*args, **kwargs)
[2019-07-07 19:13:17,657] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi File
"/usr/local/lib/python3.5/dist-packages/airflow/models/__init__.py", line 1441,
in _run_raw_task
[2019-07-07 19:13:17,662] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi result = task_copy.execute(context=context)
[2019-07-07 19:13:17,663] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi File "/root/airflow/dags/operators/spark_submit_operator.py", line
176, in execute
[2019-07-07 19:13:17,664] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi self._hook.submit(self._application)
[2019-07-07 19:13:17,665] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi File "/root/airflow/dags/hooks/spark_submit_hook.py", line 352, in
submit
[2019-07-07 19:13:17,668] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi spark_submit_cmd, returncode
[2019-07-07 19:13:17,669] {base_task_runner.py:101} INFO - Job 42: Subtask
compute_pi airflow.exceptions.AirflowException: Cannot execute:
['spark-submit', '--master', 'k8s://http://192.168.1.113:8080', '--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.options.claimName=airflow-dags',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.driver.volumes.persistentVolumeClaim.shared.mount.readOnly=true',
'--conf',
'spark.kubernetes.executor.volumes.persistentVolumeClaim.shared.mount.path=/code',
'--conf', 'spark.kubernetes.namespace=default', '--name', 'airflow-spark',
'--deploy-mode', 'cluster', 'local:///code/pi.py']. Error code is: 1.
[2019-07-07 19:13:18,137] {logging_mixin.py:95} INFO - [2019-07-07
19:13:18,137] {jobs.py:2630} WARNING - State of this instance has been
externally set to failed. Taking the poison pill.
[2019-07-07 19:13:18,157] {helpers.py:281} INFO - Sending Signals.SIGTERM to
GPID 15
[2019-07-07 19:13:18,238] {helpers.py:263} INFO - Process
psutil.Process(pid=15, status='terminated') (15) terminated with exit code -15
[2019-07-07 19:13:18,270] {logging_mixin.py:95} INFO - [2019-07-07
19:13:18,259] {jobs.py:2562} INFO - Task exited with return code 0
```
Otherwise, it would become very hard to find errors in the configuration (in
the image in this case).
### Jira
- [ ] My PR addresses the following [Airflow
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references
them in the PR title. For example, "\[AIRFLOW-4906\] My Airflow PR"
- https://issues.apache.org/jira/browse/AIRFLOW-4906
- In case you are fixing a typo in the documentation you can prepend your
commit with \[AIRFLOW-4906\], code changes always need a Jira issue.
- In case you are proposing a fundamental code change, you need to create
an Airflow Improvement Proposal
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
- In case you are adding a dependency, check if the license complies with
the [ASF 3rd Party License
Policy](https://www.apache.org/legal/resolved.html#category-x).
### Description
- [ ] Here are some details about my PR, including screenshots of any UI
changes:
### Tests
- [ ] My PR adds the following unit tests __OR__ does not need testing for
this extremely good reason:
### Commits
- [ ] My commits all reference Jira issues in their subject lines, and I
have squashed multiple commits if they address the same issue. In addition, my
commits follow the guidelines from "[How to write a good git commit
message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"
### Documentation
- [ ] In case of new functionality, my PR adds documentation that describes
how to use it.
- All the public functions and the classes in the PR contain docstrings
that explain what it does
- If you implement backwards incompatible changes, please leave a note in
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so
we can assign it to a appropriate release
### Code Quality
- [ ] Passes `flake8`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Improve debugging for the SparkSubmitHook
> -----------------------------------------
>
> Key: AIRFLOW-4906
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4906
> Project: Apache Airflow
> Issue Type: Improvement
> Components: hooks
> Affects Versions: 1.10.3
> Reporter: Fokko Driesprong
> Assignee: Fokko Driesprong
> Priority: Major
> Fix For: 2.0.0
>
>
> Currently, the output of the spark-submit command is not being sent to the
> logs. This makes debugging of the k8s jobs rather hard. For example, if you
> make a typo, you only will get the exit code which is non-descriptive.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)