[GitHub] [airflow] boittega opened a new issue #8713: Spark SQL Hook not using connections

GitBox Mon, 04 May 2020 16:32:20 -0700


boittega opened a new issue #8713:
URL: https://github.com/apache/airflow/issues/8713



   **Apache Airflow version**: 1.10.10
   
   **What happened**:
   
   `SparkSqlHook` is not using any connection, the default conn_id is 
`spark_sql_default`, if this connection doesn't exist, the hook returns an 
error:
   ```
   Traceback (most recent call last):
     File 
"/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 983, in _run_raw_task
       result = task_copy.execute(context=context)
     File 
"/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/contrib/operators/spark_sql_operator.py",
 line 109, in execute
       yarn_queue=self._yarn_queue
     File 
"/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/contrib/hooks/spark_sql_hook.py",
 line 75, in __init__
       self._conn = self.get_connection(conn_id)
     File 
"/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/hooks/base_hook.py",
 line 84, in get_connection
       conn = random.choice(list(cls.get_connections(conn_id)))
     File 
"/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/hooks/base_hook.py",
 line 80, in get_connections
       return secrets.get_connections(conn_id)
     File 
"/Users/rbottega/Documents/airflow_latest/env/lib/python3.7/site-packages/airflow/secrets/__init__.py",
 line 56, in get_connections
       raise AirflowException("The conn_id `{0}` isn't defined".format(conn_id))
   airflow.exceptions.AirflowException: The conn_id `spark_sql_default` isn't 
defined
   ```
   If specified any valid connection, it does nothing, the `self._conn` 
variable is never used and there is an empty `get_conn` method.
   ```
       def get_conn(self):
           pass
   ```
   
   **What you expected to happen**:
   
   It should follow the same behaviour of `SparkSubmitHook` to receive the 
master host and extra parameters from the connection OR don't request a 
connection ID.
   
   **How to reproduce it**:
   Just create a DAG with a `SparkSqlOperator` and have not created the 
connection `spark_sql_default`.
   ```
       sql_job = SparkSqlOperator(
           sql="SELECT * FROM test",
           master="local",
           task_id="sql_job"
       )
   ```
   
   
   **Anything else we need to know**:
   
   I am happy to implement any of these solutions.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] boittega opened a new issue #8713: Spark SQL Hook not using connections

Reply via email to