[
https://issues.apache.org/jira/browse/AIRFLOW-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on AIRFLOW-7026 started by Kengo Seki.
-------------------------------------------
> Improve SparkSqlHook's error message
> ------------------------------------
>
> Key: AIRFLOW-7026
> URL: https://issues.apache.org/jira/browse/AIRFLOW-7026
> Project: Apache Airflow
> Issue Type: Improvement
> Components: hooks
> Affects Versions: 1.10.9
> Reporter: Kengo Seki
> Assignee: Kengo Seki
> Priority: Major
>
> If {{SparkSqlHook.run_query()}} fails, it raises the following exception.
> {code}
> if returncode:
> raise AirflowException(
> "Cannot execute {} on {}. Process exit code: {}.".format(
> cmd, self._conn.host, returncode
> )
> )
> {code}
> But this message is not so useful actually. For example:
> {code}
> In [1]: from airflow.providers.apache.spark.operators.spark_sql import
> SparkSqlOperator
>
> In [2]: SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE",
> master="local[*]", conn_id="spark_default", task_id="_").execute(None)
>
> (snip)
> ---------------------------------------------------------------------------
> AirflowException Traceback (most recent call last)
> <ipython-input-2-d69c4454e999> in <module>
> ----> 1 SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE",
> master="local[*]", conn_id="spark_default", task_id="_").execute(None)
> ~/repos/incubator-airflow/airflow/providers/apache/spark/operators/spark_sql.py
> in execute(self, context)
> 105 yarn_queue=self._yarn_queue
> 106 )
> --> 107 self._hook.run_query()
> 108
> 109 def on_kill(self):
> ~/repos/incubator-airflow/airflow/providers/apache/spark/hooks/spark_sql.py
> in run_query(self, cmd, **kwargs)
> 154 raise AirflowException(
> 155 "Cannot execute {} on {}. Process exit code:
> {}.".format(
> --> 156 cmd, self._conn.host, returncode
> 157 )
> 158 )
> AirflowException: Cannot execute on yarn. Process exit code: 1.
> {code}
> Most users will expect the executed query is shown as the first argument for
> the exception and the "master" value (i.e., "local[*]" here) as the second,
> but meaningless information (an empty string and "yarn") is shown instead.
> The reason are as follows:
> * The executed query is specified by the "sql" parameter for the
> {{SparkSqlHook.\_\_init__}} method, not by {{cmd}}.
> * The "master" value is specified by the "master" parameter for the
> {{SparkSqlHook.\_\_init__}} method, not by {{self._conn.host}}. Actually,
> {{self._conn}} is not used at all in SparkSqlHook.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)