[HACKERS] Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Alexey Vasiliev Tue, 30 Dec 2014 04:11:55 -0800

 Hello.

Thanks, I understand, what look in another part of code. Hope right now I did 
right changes.


To not modify current pg_usleep calculation, I changed 
restore_command_retry_interval value to seconds (not milliseconds). In this 
case, min value - 1 second.


Mon, 29 Dec 2014 00:15:03 +0900 от Michael Paquier <[email protected]>:
>On Sat, Dec 27, 2014 at 3:42 AM, Alexey Vasiliev < [email protected] > wrote:
>> Thanks for suggestions.
>>
>> Patch updated.
>
>Cool, thanks. I just had an extra look at it.
>
>+        This is useful, if I using for restore of wal logs some
>+        external storage (like AWS S3) and no matter what the slave database
>+        will lag behind the master. The problem, what for each request to
>+        AWS S3 need to pay, what is why for N nodes, which try to get next
>+        wal log each 5 seconds will be bigger price, than for example each
>+        30 seconds.
>I reworked this portion of the docs, it is rather incorrect as the
>documentation should not use first-person subjects, and I don't
>believe that referencing any commercial products is a good thing in
>this context.
>
>+# specifies an optional timeout after nonzero code of restore_command.
>+# This can be useful to increase/decrease number of a restore_command calls.
>This is still referring to a timeout. That's not good. And the name of
>the parameter at the top of this comment block is missing.
>
>+static int     restore_command_retry_interval = 5000L;
>I think that it would be more adapted to set that to 5000, and
>multiply by 1L. I am also wondering about having a better lower bound,
>like 100ms to avoid some abuse with this feature in the retries?
>
>+                               ereport(ERROR,
>+
>(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>+                                                errmsg("\"%s\" must
>be bigger zero",
>+                                       "restore_command_retry_interval")));
>I'd rather rewrite that to "must have a strictly positive value".
>
>-                                        * Wait for more WAL to
>arrive. Time out after 5 seconds,
>+                                        * Wait for more WAL to
>arrive. Time out after
>+                                        *
>restore_command_retry_interval (5 seconds by default),
>                                         * like when polling the
>archive, to react to a trigger
>                                         * file promptly.
>                                         */
>                                        
>WaitLatch(&XLogCtl->recoveryWakeupLatch,
>                                                          WL_LATCH_SET
>| WL_TIMEOUT,
>-                                                         5000L);
>+
>restore_command_retry_interval);
>I should have noticed earlier, but in its current state your patch
>actually does not work. What you are doing here is tuning the time
>process waits for WAL from stream. In your case what you want to
>control is the retry time for a restore_command in archive recovery,
>no?
>-- 
>Michael
>
>
>-- 
>Sent via pgsql-hackers mailing list ([email protected])
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Alexey Vasiliev

diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index ef78bc0..38420a5 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,26 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </listitem>
      </varlistentry>
 
+     <varlistentry id="restore-command-retry-interval" xreflabel="restore_command_retry_interval">
+      <term><varname>restore_command_retry_interval</varname> (<type>integer</type>)
+      <indexterm>
+        <primary><varname>restore_command_retry_interval</> recovery parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        If <varname>restore_command</> returns nonzero exit status code, retry
+        command after the interval of time specified by this parameter.
+        Default value is <literal>5s</>.
+       </para>
+       <para>
+        This is useful, if I using for restore of wal logs some
+        external storage and no matter what the slave database
+        will lag behind the master.
+       </para>
+      </listitem>
+     </varlistentry>
+
     </variablelist>
 
   </sect1>
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index b777400..5b63f60 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,11 @@
 #
 #recovery_end_command = ''
 #
+# specifies an optional retry interval of restore_command command, if previous return nonzero exit status code.
+# This can be useful to increase/decrease number of a restore_command calls.
+#
+#restore_command_retry_interval = 5s
+#
 #---------------------------------------------------------------------------
 # RECOVERY TARGET PARAMETERS
 #---------------------------------------------------------------------------
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e5dddd4..83a6db0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int 	restore_command_retry_interval = 5;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -4881,6 +4882,28 @@ readRecoveryCommandFile(void)
 					(errmsg_internal("trigger_file = '%s'",
 									 TriggerFile)));
 		}
+		else if (strcmp(item->name, "restore_command_retry_interval") == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item->value, &restore_command_retry_interval, GUC_UNIT_S,
+						   &hintmsg))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("parameter \"%s\" requires a temporal value",
+								"restore_command_retry_interval"),
+						 hintmsg ? errhint("%s", _(hintmsg)) : 0));
+			ereport(DEBUG2,
+					(errmsg_internal("restore_command_retry_interval = '%s'", item->value)));
+
+			if (restore_command_retry_interval < 1)
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("\"%s\" must have a strictly positive value",
+					"restore_command_retry_interval")));
+			}
+		}
 		else if (strcmp(item->name, "recovery_min_apply_delay") == 0)
 		{
 			const char *hintmsg;
@@ -10495,13 +10518,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 					 * machine, so we've exhausted all the options for
 					 * obtaining the requested WAL. We're going to loop back
 					 * and retry from the archive, but if it hasn't been long
-					 * since last attempt, sleep 5 seconds to avoid
-					 * busy-waiting.
+					 * since last attempt, sleep restore_command_retry_interval
+					 * (by default 5 seconds) to avoid busy-waiting.
 					 */
 					now = (pg_time_t) time(NULL);
-					if ((now - last_fail_time) < 5)
+					if ((now - last_fail_time) < restore_command_retry_interval)
 					{
-						pg_usleep(1000000L * (5 - (now - last_fail_time)));
+						pg_usleep(1000000L * (restore_command_retry_interval - (now - last_fail_time)));
 						now = (pg_time_t) time(NULL);
 					}
 					last_fail_time = now;

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Reply via email to