Hello, I'm maintain postgresql cluster with streaming replication for php-based webapp. And for a few days I'm trying to get rid of errors in my setup:
Application server DB server | PHP -> pgbouncer -> haproxy | -> | postgresql | pgbouncer pools connections from php (session-based) and haproxy load-balance and failovering 3 backend postgresql servers. Every ~10 min haproxy drops connection and pgbouncer reports: application logs: failed to execute the SQL statement: SQLSTATE[08P01]: <<Unknown error>>: 7 ERROR: server conn crashed? syslog: Mar 4 22:16:12 app1 pgbouncer[15572]: C-0x1d0c130: mydb/pgsql@unix:5432 Pooler Error: server conn crashed? When I remove haproxy from this setup or change balancer to any other tcp-balancer: balancer-ng for example everything works fine! I tried almost everything I can imagine: - changing connection between php, pgbouncer and haproxy to tcp/ip or unix-socket - changing timeouts, conn lifetimes, keepalive, addition tcp options, pool modes - downgrading to older versions of pgbouncer and haproxy - reduced number of TW-sockets by changing connectivity of other components to unix-socket where possible Any ideas what to look for? software versions: php5 5.4.23 pgbouncer 1.5.4 haproxy 1.5dev22 pgbouncer config: ============== [databases] * = host=127.0.0.1 port=6432 [pgbouncer] syslog = 1 pidfile = /var/run/postgresql/pgbouncer.pid listen_addr = 127.0.0.1 listen_port = 5432 unix_socket_dir = /var/run/postgresql listen_backlog = -1 auth_type = trust auth_file = /etc/pgbouncer/userlist.txt admin_users = stats_users = pgsql pool_mode = session server_reset_query = DISCARD ALL; ignore_startup_parameters = application_name server_check_query = select 1 server_check_delay = 10 max_client_conn = 5120 default_pool_size = 16 reserve_pool_size = 0 reserve_pool_timeout = 0 log_connections = 0 log_disconnections = 0 log_pooler_errors = 1 server_lifetime = 1200 server_idle_timeout = 60 query_timeout = 0 client_login_timeout = 60 ============== haproxy config: ============== defaults option splice-auto option tcpka timeout connect 5s timeout client 2s timeout server 10s listen stats :18080 mode http stats enable stats uri / listen pgsql 127.0.0.1:6432 maxconn 3000 mode tcp balance roundrobin option tcp-smart-accept option tcp-smart-connect option pgsql-check user postgres server slave1 10.0.0.1:5432 server slave2 10.0.0.2:5432 server slave3 10.0.0.3:5432 ==============