[ 
https://issues.apache.org/jira/browse/AIRFLOW-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-7026 started by Kengo Seki.
-------------------------------------------
> Improve SparkSqlHook's error message
> ------------------------------------
>
>                 Key: AIRFLOW-7026
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-7026
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: hooks
>    Affects Versions: 1.10.9
>            Reporter: Kengo Seki
>            Assignee: Kengo Seki
>            Priority: Major
>
> If {{SparkSqlHook.run_query()}} fails, it raises the following exception.
> {code}
>         if returncode:
>             raise AirflowException(
>                 "Cannot execute {} on {}. Process exit code: {}.".format(
>                     cmd, self._conn.host, returncode
>                 )
>             )
> {code}
> But this message is not so useful actually. For example:
> {code}
> In [1]: from airflow.providers.apache.spark.operators.spark_sql import 
> SparkSqlOperator                                                              
>                         
> In [2]: SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE", 
> master="local[*]", conn_id="spark_default", task_id="_").execute(None)        
>                               
> (snip)
> ---------------------------------------------------------------------------
> AirflowException                          Traceback (most recent call last)
> <ipython-input-2-d69c4454e999> in <module>
> ----> 1 SparkSqlOperator(sql="SELECT * FROM NON_EXISTENT_TABLE", 
> master="local[*]", conn_id="spark_default", task_id="_").execute(None)
> ~/repos/incubator-airflow/airflow/providers/apache/spark/operators/spark_sql.py
>  in execute(self, context)
>     105                                   yarn_queue=self._yarn_queue
>     106                                   )
> --> 107         self._hook.run_query()
>     108 
>     109     def on_kill(self):
> ~/repos/incubator-airflow/airflow/providers/apache/spark/hooks/spark_sql.py 
> in run_query(self, cmd, **kwargs)
>     154             raise AirflowException(
>     155                 "Cannot execute {} on {}. Process exit code: 
> {}.".format(
> --> 156                     cmd, self._conn.host, returncode
>     157                 )
>     158             )
> AirflowException: Cannot execute  on yarn. Process exit code: 1.
> {code}
> Most users will expect the executed query is shown as the first argument for 
> the exception and the "master" value (i.e., "local[*]" here) as the second, 
> but meaningless information (an empty string and "yarn") is shown instead.
> The reason are as follows:
> * The executed query is specified by the "sql" parameter for the 
> {{SparkSqlHook.\_\_init__}} method, not by {{cmd}}. 
> * The "master" value is specified by the "master" parameter for the 
> {{SparkSqlHook.\_\_init__}} method, not by {{self._conn.host}}. Actually, 
> {{self._conn}} is not used at all in SparkSqlHook.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to