wavewater opened a new issue #17135:
URL: https://github.com/apache/airflow/issues/17135


   
   When calling the exasol hooks get_pandas_df function 
(https://github.com/apache/airflow/blob/main/airflow/providers/exasol/hooks/exasol.py)
 I noticed that it does not return a pandas dataframe. It returns None. In fact 
the function definition type hint explicitly states that None is returned. But 
the name of the function suggests otherwise. The name get_pandas_df implies 
that it should return a dataframe and not None.
   
   I think that it would make more sense if get_pandas_df would indeed return a 
dataframe as the name is alluring to. So the code should be like this:
   
   `def get_pandas_df(self, sql: Union[str, list], parameters: Optional[dict] = 
None, **kwargs) -> pd.DataFrame:
   ... some code ...
   with closing(self.get_conn()) as conn:
   df=conn.export_to_pandas(sql, query_params=parameters, **kwargs)
   return df`
   
   INSTEAD OF:
   
   `def get_pandas_df(self, sql: Union[str, list], parameters: Optional[dict] = 
None, **kwargs) -> None:
   ... some code ...
   with closing(self.get_conn()) as conn:
   conn.export_to_pandas(sql, query_params=parameters, **kwargs)`
   
   **Apache Airflow version**: 2.1.0
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl 
version`): Not using Kubernetes
   
   **Environment**:Official Airflow-Docker Image
   
   - **Cloud provider or hardware configuration**: no cloud - docker host (DELL 
Server with 48 Cores, 512GB RAM and many TB storage)
   - **OS** (e.g. from /etc/os-release):Official Airflow-Docker Image on CentOS 
7 Host
   - **Kernel** (e.g. `uname -a`): Linux cad18b35be00 
3.10.0-1160.21.1.el7.x86_64 #1 SMP Tue Mar 16 18:28:22 UTC 2021 x86_64 GNU/Linux
   - **Install tools**: only docker
   - **Others**:
   
   **What happened**:
   You can replicate the findings with following dag file:
   
   import datetime
   
   from airflow import DAG
   from airflow.operators.python_operator import PythonOperator
   from airflow.providers.exasol.operators.exasol import ExasolHook
   import pandas as pd
   
   
   default_args = {"owner": "airflow"}
   
   
   def call_exasol_hook(**kwargs):
       #Make connection to Exasol
       hook = ExasolHook(exasol_conn_id='Exasol QA')
       sql = 'select 42;'    
       df = hook.get_pandas_df(sql = sql) 
       return df
       
   with DAG(
       dag_id="exasol_hook_problem",
       start_date=datetime.datetime(2021, 5, 5),
       schedule_interval="@once",
       default_args=default_args,
       catchup=False,
   ) as dag:
         
       set_variable = PythonOperator(
           task_id='call_exasol_hook',
           python_callable=call_exasol_hook
       )
   
   Sorry for the strange code formatting. I do not know how to fix this in the 
github UI form. 
   Sorry also in case I missed something.
    
   When testing or executing the task via CLI:
   ` airflow tasks test exasol_hook_problem call_exasol_hook 2021-07-20`
   
   the logs show:
   `[2021-07-21 12:53:19,775] {python.py:151} INFO - Done. Returned value was: 
None`
   
   None was returned - although get_pandas_df was called. A pandas df should 
have been returned instead.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to