[I] ADBC Python Postgres - Stuck connections to the database [arrow-adbc]

via GitHub Wed, 22 May 2024 09:31:05 -0700


gaspardc-met opened a new issue, #1881:
URL: https://github.com/apache/arrow-adbc/issues/1881


   ### What happened?
   
   Context before the bug (working):
   - Postgres database on Kubernetes with several tables
   - 4 services (webapp, machine learning inference, and FastAPI backend APIs) 
deployed on kubernetes and fetching data from postgres
   - 1 service, data orchestrator, writing data to Postgres
   - Fetching data from PG with `pd.read_sql` from Pandas and a SQLalchemy 
engine
   - Been doing this for 1+ year without any Postgres issues
   
   Switching to ADBC:
   - Following my upgrade to pandas >2.0.0 I wanted to switch to 
`adbc_driver_postgresql`'s `dbapi` connection with `pd.read_sql`
   - Initial tests were great, it was faster than before
   - Deployed this to production on all aforementioned services twice 
(initially with connection caching, then no caching and properly closing each 
and every connection
   - Once again initially smooth, everything worked and was fast
   
   Problem:
   - In both instances of the deployment, within ~12 hours, the connections 
would be stuck
   - Webapp or another service would create an ADBC connection, and run the sql 
query with `pd.read_sql` (know this through caching) and then wait 
indefinitely. 
   - Reloading the webapp, clearing webapp cache, recreating the connection 
would do nothing at all
   - The log on the Postgres pod indicated a password issue with the current 
database/user, which never happened before
   - Both SQLalchemy and ADBC get the same postgres URI to create the 
engine/connection with
   - Reverting to SQLalchemy solved the problem, and the error has not been 
seen again
   
   ### How can we reproduce the bug?
   
   - The given URI was `"postgresql://{user}:{password}@{host}:{port}/{db}"` 
formatted with the proper values
   - The function was used to create the ADBC connection:
   
   ```python
   def create_adbc_conn() -> Connection:
       logger_stdout.info(f"Creating a new ADBC connection at 
{pd.Timestamp.now()}.")
       uri = get_default_uri() # URI shown above, formatted
       connection = dbapi.connect(uri=uri)
       logger_stdout.info("ADBC connection created")
       return connection
   ```
   
   - The function to execute the SQL query was:
   ```python
   def handle_sql_query(
       sql: str,
       index_col: Optional[str] = None,
       connection: Optional[Connection] = None,
       need_to_close: bool = False,
   ) -> pd.DataFrame:
       if engine is None:
           logger_stdout.info(f"Engine is None, creating a new ADBC connection 
at {pd.Timestamp.now()}.")
           connection= create_adbc_conn()
           need_to_close = True
       try:
           logger_stdout.info("Executing SQL query with connection")
           return pd.read_sql_query(sql=sql, con=connection, 
index_col=index_col, parse_dates=[index_col])
       finally:
           if need_to_close:
               logger_stdout.info("Closing the ADBC connection.")
               connection.close()
   ```
   
   - The SQL queries ranged from `select * from TABLE_NAME` to selecting 
specific columns on a range of specific dates
   
   ### Environment/Setup
   
   python 3.11
   pandas == 2.2.2
   adbc_driver_postgresql==0.11.0
   adbc-driver-manager==0.11.0
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] ADBC Python Postgres - Stuck connections to the database [arrow-adbc]

Reply via email to