ashb commented on a change in pull request #5352: [AIRFLOW-4717] The
spark_binary option does not apply in sparkSubmitO…
URL: https://github.com/apache/airflow/pull/5352#discussion_r291071447
##########
File path: airflow/contrib/hooks/spark_submit_hook.py
##########
@@ -187,7 +187,6 @@ def _resolve_connection(self):
conn_data['queue'] = extra.get('queue', None)
conn_data['deploy_mode'] = extra.get('deploy-mode', None)
conn_data['spark_home'] = extra.get('spark-home', None)
- conn_data['spark_binary'] = extra.get('spark-binary',
"spark-submit")
Review comment:
This is not the right fix, as it breaks anyone who has specified the
connection level.
```
======================================================================
23) FAIL: test_resolve_connection_spark_binary_set_connection
(tests.contrib.hooks.test_spark_submit_hook.TestSparkSubmitHook)
----------------------------------------------------------------------
Traceback (most recent call last):
tests/contrib/hooks/test_spark_submit_hook.py line 382 in
test_resolve_connection_spark_binary_set_connection
self.assertEqual(connection, expected_spark_connection)
AssertionError: {'nam[32 chars]y': 'spark-submit', 'master': 'yarn',
'spark_h[42 chars]None} != {'nam[32 chars]y': 'custom-spark-submit', 'master':
'yarn', '[49 chars]None}
{'deploy_mode': None,
'master': 'yarn',
'namespace': 'default',
'queue': None,
- 'spark_binary': 'spark-submit',
+ 'spark_binary': 'custom-spark-submit',
? +++++++
'spark_home': None}
-------------------- >> begin captured logging << --------------------
airflow.utils.log.logging_mixin.LoggingMixin: INFO: Using connection to:
id: spark_binary_set. Host: yarn, Port: None, Schema: None, Login: None,
Password: None, extra: {'spark-binary': 'custom-spark-submit'}
airflow.utils.log.logging_mixin.LoggingMixin: INFO: Using connection to:
id: spark_binary_set. Host: yarn, Port: None, Schema: None, Login: None,
Password: None, extra: {'spark-binary': 'custom-spark-submit'}
airflow.contrib.hooks.spark_submit_hook.SparkSubmitHook: INFO:
Spark-Submit cmd: ['spark-submit', '--master', 'yarn', '--name',
'default-name', 'test_application.py']
--------------------- >> end captured logging << ---------------------
```
Instead this line should be something like:
```
conn_data['spark_binary'] = extra.get('spark-binary',
self._spark_binary)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services