Hi all,

I'm doing some testing of Postgres 9.0 archiving and streaming replication 
between a couple of Solaris 10 servers. Recently I was trying to test how well 
the standby server catches up after an outage, and a question arose.

It seems that if the standby is uncontactable by the primary when it is 
attempting WAL archiving, the primary will attempt the copy three times, then 
log that the log file could not be archived, as there were too many failures. 
See:

ssh: connect to host 172.18.131.212 port 22: Connection timed out^M
lost connection
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: scp pg_xlog/000000010000000000000006 
[email protected]:/postgres/postgres/9.0-pgdg/primary_archive
ssh: connect to host 172.18.131.212 port 22: Connection timed out^M
lost connection
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: scp pg_xlog/000000010000000000000006 
[email protected]:/postgres/postgres/9.0-pgdg/primary_archive
ssh: connect to host 172.18.131.212 port 22: Connection timed out^M
lost connection
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: scp pg_xlog/000000010000000000000006 
[email protected]:/postgres/postgres/9.0-pgdg/primary_archive
WARNING:  transaction log file "000000010000000000000006" could not be 
archived: too many failures


But then the primary retries this another 49 times! So 150 attempts in all.

What I need to know is whether these numbers are configurable? Can they be 
timed? How long before the primary stops retrying altogether?

Any help appreciated. Thanks!
Dan
-- 
This message posted from opensolaris.org
_______________________________________________
databases-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/databases-discuss

Reply via email to