Hi Team,

Postgres Version:- 13.8
Issue:- Logical replication failing with SSL SYSCALL error
Priority:-High

We are migrating our database through logical replications, and all of
sudden below error pops up in the source and target logs which leads us to
nowhere.

*Logs from Source:-*
LOG:  could not send data to client: Connection reset by peer
STATEMENT:  COPY public.test TO STDOUT
FATAL:  connection to client lost
STATEMENT:  COPY public.test TO STDOUT

*Logs from Target:-*
2023-04-15 19:07:02 UTC::@:[1250]:ERROR: could not receive data from WAL
stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:07:02 UTC::@:[1250]:CONTEXT: COPY test, line 365326932
2023-04-15 19:07:03 UTC::@:[505]:LOG: background worker "logical
replication worker" (PID 1250) exited with exit code 1
2023-04-15 19:07:03 UTC::@:[7155]:LOG: logical replication table
synchronization worker for subscription " sub_tables_2_180", table "test"
has started
2023-04-15 19:12:05
UTC:10.144.19.34(33276):postgres@webadmit_staging:[7112]:WARNING:
there is no transaction in progress
2023-04-15 19:14:08
UTC:10.144.19.34(33324):postgres@webadmit_staging:[6052]:LOG:
could not receive data from client: Connection reset by peer
2023-04-15 19:17:23 UTC::@:[2112]:ERROR: could not receive data from WAL
stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:17:23 UTC::@:[1089]:ERROR: could not receive data from WAL
stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:17:23 UTC::@:[2556]:ERROR: could not receive data from WAL
stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical
replication worker" (PID 2556) exited with exit code 1
2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical
replication worker" (PID 2112) exited with exit code 1
2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical
replication worker" (PID 1089) exited with exit code 1
2023-04-15 19:17:23 UTC::@:[7287]:LOG: logical replication apply worker for
subscription "sub_tables_2_180" has started
2023-04-15 19:17:23 UTC::@:[7288]:LOG: logical replication apply worker for
subscription "sub_tables_3_192" has started
2023-04-15 19:17:23 UTC::@:[7289]:LOG: logical replication apply worker for
subscription "sub_tables_1_180" has started

Just after this error, all other replication slots get disabled for some
time and come back online along with COPY command with the new PID in
pg_stat_activity.

I have a few queries regarding this:-

   1. The exact reason for disconnection (Few articles claim memory and few
   network)
   2. Will it lead to data inconsistency?
   3. Does this new PID COPY command again migrate the whole data of the
   test table once again?

Please help we got stuck here.
-- 
Thanks and Regards,
Shaurya Jain
email:- 12345shau...@gmail.com
*Mobile:- +91-8802809405*
LinkedIn:- https://www.linkedin.com/in/shaurya-jain-74353023

Reply via email to