Hey dear fellows, calling for your help here.

We are experiencing a lot of random annoying InterfaceError.
Affected endpoints are rather different (a set of ~10 endpoints), and I see 
no common behaviour in them. For example, it sometimes fails even in the 
admin interface during GET requests.
It does not happen locally, and I have found no way of reproducing it yet.

Our setup is pretty standard:
- Python 3.11, Django 4.1. PostgreSQL 14.5
- We are hosted on Heroku, waitress works as a webserver (6 threads), 2-5 
servers (dynos) could be running
- On top of that, we run 1-2 dramatiq workers (1 process, 4 threads each)
- Database statement timeout is set to 10000 (10 seconds)
- ATOMIC_REQUESTS=False (Django default)
- AUTOCOMMIT=True (Django default)
- And we are using persistent database connections (we tried setting - 
DATABASE_CONN_MAX_AGE between 10 and 600 seconds)
- We did set CONN_HEALTH_CHECKS=True, but this did not change the picture
- We are not using PGBouncer
- There are no issues with reaching connection limits, we have plenty of 
space there
PostgreSQL logs are not showing anything unusual. 

After chatting with Heroku support and analysing logs from their side, we 
found out that it is our Django instance that is closing connection, not 
Postgres.
I did add some logging to our database backend:
https://gist.github.com/amureki/20aa1a5e2e08d2d62f803b80d85b86da

This showed to me that sometimes a request will open up a connection and 
will instantly close it due to autocommit check, then the request will fail 
with InterfaceError. Example log:

 Nov 28 17:28:24 app/web.2 at=INFO 
request_id=aa3101c9-0502-4015-86d3-2fc420815240 
logger=db.backends.postgis.base Opened connection 1105751 user_agent="X"
 Nov 28 17:28:24 app/web.2 at=INFO 
request_id=aa3101c9-0502-4015-86d3-2fc420815240 
logger=db.backends.postgis.base Checking if connection 1105751 is usable 
user_agent="X"
 Nov 28 17:28:24 app/web.2 at=INFO 
request_id=aa3101c9-0502-4015-86d3-2fc420815240 
logger=db.backends.postgis.base Closing connection due to autocommit 
user_agent="X"
 Nov 28 17:28:24 app/web.2 at=INFO 
request_id=aa3101c9-0502-4015-86d3-2fc420815240 
logger=db.backends.postgis.base Closed connection 1105751 user_agent="X"
 Nov 28 17:28:24 app/web.2 at=ERROR 
request_id=aa3101c9-0502-4015-86d3-2fc420815240 
logger=db.backends.postgis.base Attempted to use a closed connection. 
Reconnecting. user_agent="X"
 Nov 28 17:28:24 app/web.2 at=INFO 
request_id=aa3101c9-0502-4015-86d3-2fc420815240 
logger=db.backends.postgis.base Opened connection 1105752 user_agent="X"
 Nov 28 17:28:25 app/web.2 at=INFO 
request_id=aa3101c9-0502-4015-86d3-2fc420815240 
logger=log_request_id.middleware method=GET 
path=/admin/myapp/item/1/change/ status=200 user_agent="X"

I'd love to receive any feedback or suggestions on further debugging here. 
I am starting to think if this might be some internal Django issue...

With regards,
Rustem Saiargaliev

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/761c8e14-75f0-40b3-8ef4-703390f22966n%40googlegroups.com.

Reply via email to