Hi Yann,

As we pointed out in our original discussion thread, we dropped the child 
process implementation due to the kernel defect with changing the number of 
open sockets. Now, we quickly tested this child process implementation 
(prefork) with our workload on a modern Xeon dual sockets server and most 
recent 3.13.6 kernel again.

1. We do not see "connection reset" errors during the run (ramp up and steady 
stay) any more. However, we noticed that our workload cannot ramp down and 
terminate on its own with this child process implementation. This never 
happened before with either "out of box" httpd or the parent process 
implementation. After manually force shutdown the workload, we saw these 
"connection reset" errors again.

2. During the run, we noticed that there are tons of "read timed out" errors. 
These errors not only happen when the system is highly utilized, it even 
happens when system is only 10% utilized. The response time was high.

3. Compared to parent process implementation, we found child process 
implementation results in significantly higher (up to 10X) response time (The 
read timed out errors are not counted in the result) at different CPU 
utilization levels. At peak performance level, it has ~22% less throughput with 
tons of "connection reset" errors in additional to "read timed out" errors. 
Parent process implementation does not have errors.

We think the reason of above findings may be caused by: 1. Too many open 
sockets created by the children processes; and/or 2. Parent process does not 
have control, or maybe 3. Kernel defect is not fully addressed. On the other 
hand, the parent implementation keeps minimal number of open sockets that takes 
advantage of SO_REUSEPORT and keeps the environment more controllable.

We are currently modifying the code based on all the feedbacks from the 
community with the original parent process implementation which also includes 
separating the original patch between with and without SO_REUSEPORT support. 
This would make SO_REUSEPORT patch cleaner and simpler.

Thanks,
Yingqi (people at work also call me Lucy:))


From: Yann Ylavic [mailto:ylavic....@gmail.com]
Sent: Friday, March 07, 2014 9:07 AM
To: httpd
Subject: SO_REUSEPORT in the children processes

Hi all,
the patch about SO_REUSEPORT proposed by Yingqi Lu in [1] and discussed in [2] 
uses listeners buckets to address a defect [3] in the current linux 
implementation (his patch goes beyond SO_REUSEPORT though, and suggests a new 
MPM even when the option is not available).
Should this defect be addressed by linux folks, the event/worker/prefork MPMs 
could take full advantage of the option (linux-3.9+) with quite simple 
modifications of the current code.
I'm proposing here the corresponding patch.

The idea is to re-create and re-bind/listen the parent's listeners sockets for 
each child process, when starting, before dropping priviledges.
For this, the patch introduces a new ap_reopen_listeners() function (meant to 
be called by each child) to do the job on the inherited listeners. It does 
nothing unless HAVE_SO_REUSEPORT is defined.

The advantage of this approach is that no accept mutex is needed anymore (each 
child has its own sockets), hence the SAFE_ACCEPT macros can do nothing when 
HAVE_SO_REUSEPORT is defined.
The new (incoming) connections are evenly distributed accross the children for 
all the listners (as assured by Linux when SO_REUSEPORT is used).
I'm proposing the patch here so that everyone can figure out whether 
SO_REUSEPORT per se needs its own MPM or not (once/if the defect is fixed).
The option seems to be quite easily pluggable to existing MPMs (no ABI change), 
and I don't see an advantage to not using it when available (and working).

Also, FreeBSD has an implementation of SO_REUSEPORT,
however
I couldn't find whether it has the same scheduling garantee
or not
(at least I guess the accept mutex can be avoided too).

Regarding the linux kernel defect, is someone aware of a fix/work on that in 
the latest versions?

Finally, about the accept mutex, mpm event seems to work well without any, why 
prefork and worker would need one (both poll() all the listeners in a single 
thread, while other children can do the same)?
The patch follows and is attached.
It can already be tested with a workaround against the defect: don't let 
perform_idle_server_maintenance() create/stop children after startup (eg. 
StartServers, ServerLimit, Min/MaxSpareServers using the same value).

Thoughts, feedbacks welcome.

Regards,
Yann.

[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=55897#c7
[2] 
http://mail-archives.apache.org/mod_mbox/httpd-bugs/201312.mbox/%3cbug-55897-7...@https.issues.apache.org/bugzilla/%3E
[3] http://lwn.net/Articles/542629/ and http://lwn.net/Articles/542738/

Index: server/mpm/event/event.c
===================================================================
--- server/mpm/event/event.c    (revision 1575322)
+++ server/mpm/event/event.c    (working copy)
@@ -2356,6 +2356,13 @@ static void child_main(int child_num_arg)
     /*stuff to do before we switch id's, so we have permissions. */
     ap_reopen_scoreboard(pchild, NULL, 0);

+    rv = ap_reopen_listeners(pchild, num_listensocks);
+    if (rv != APR_SUCCESS) {
+        ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf, APLOGNO()
+                     "Couldn't re-open listeners");
+        clean_child_exit(APEXIT_CHILDFATAL);
+    }
+
     if (ap_run_drop_privileges(pchild, ap_server_conf)) {
         clean_child_exit(APEXIT_CHILDFATAL);
     }
Index: server/mpm/prefork/prefork.c
===================================================================
--- server/mpm/prefork/prefork.c    (revision 1575322)
+++ server/mpm/prefork/prefork.c    (working copy)
@@ -271,7 +271,9 @@ static void accept_mutex_off(void)
  * multiple Listen statements.  Define SINGLE_LISTEN_UNSERIALIZED_ACCEPT
  * when it's safe in the single Listen case.
  */
-#ifdef SINGLE_LISTEN_UNSERIALIZED_ACCEPT
+#if HAVE_SO_REUSEPORT
+#define SAFE_ACCEPT(stmt)
+#elif defined(SINGLE_LISTEN_UNSERIALIZED_ACCEPT)
 #define SAFE_ACCEPT(stmt) do {if (ap_listeners->next) {stmt;}} while(0)
 #else
 #define SAFE_ACCEPT(stmt) do {stmt;} while(0)
@@ -536,6 +538,13 @@ static void child_main(int child_num_arg)
         clean_child_exit(APEXIT_CHILDFATAL);
     }

+    status = ap_reopen_listeners(pchild, num_listensocks);
+    if (status != APR_SUCCESS) {
+        ap_log_error(APLOG_MARK, APLOG_EMERG, status, ap_server_conf, APLOGNO()
+                     "Couldn't re-open listeners");
+        clean_child_exit(APEXIT_CHILDFATAL);
+    }
+
     if (ap_run_drop_privileges(pchild, ap_server_conf)) {
         clean_child_exit(APEXIT_CHILDFATAL);
     }
Index: server/mpm/worker/worker.c
===================================================================
--- server/mpm/worker/worker.c    (revision 1575322)
+++ server/mpm/worker/worker.c    (working copy)
@@ -220,7 +220,9 @@ static apr_os_thread_t *listener_os_thread;
 /* Locks for accept serialization */
 static apr_proc_mutex_t *accept_mutex;

-#ifdef SINGLE_LISTEN_UNSERIALIZED_ACCEPT
+#if HAVE_SO_REUSEPORT
+#define SAFE_ACCEPT(stmt)
+#elif defined(SINGLE_LISTEN_UNSERIALIZED_ACCEPT)
 #define SAFE_ACCEPT(stmt) (ap_listeners->next ? (stmt) : APR_SUCCESS)
 #else
 #define SAFE_ACCEPT(stmt) (stmt)
@@ -1228,6 +1230,13 @@ static void child_main(int child_num_arg)
     /*stuff to do before we switch id's, so we have permissions.*/
     ap_reopen_scoreboard(pchild, NULL, 0);

+    rv = ap_reopen_listeners(pchild, num_listensocks);
+    if (rv != APR_SUCCESS) {
+        ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf, APLOGNO()
+                     "Couldn't re-open listeners");
+        clean_child_exit(APEXIT_CHILDFATAL);
+    }
+
     rv = SAFE_ACCEPT(apr_proc_mutex_child_init(&accept_mutex,
                                                
apr_proc_mutex_lockfile(accept_mutex),
                                                pchild));
Index: server/listen.c
===================================================================
--- server/listen.c    (revision 1575322)
+++ server/listen.c    (working copy)
@@ -60,6 +60,9 @@ static apr_status_t make_sock(apr_pool_t *p, ap_li
 #endif
     apr_status_t stat;

+    /* until further notice */
+    server->sd = NULL;
+
 #ifndef WIN32
     stat = apr_socket_opt_set(s, APR_SO_REUSEADDR, one);
     if (stat != APR_SUCCESS && stat != APR_ENOTIMPL) {
@@ -125,6 +128,23 @@ static apr_status_t make_sock(apr_pool_t *p, ap_li
 #endif

     if (do_bind_listen) {
+#if HAVE_SO_REUSEPORT
+        {
+            int yes = 1;
+            apr_os_sock_t sd = -1;
+            apr_os_sock_get(&sd, s);
+            if (setsockopt(sd, SOL_SOCKET, SO_REUSEPORT,
+                           (void *)&yes, sizeof(int)) == -1) {
+                stat = errno;
+                ap_log_perror(APLOG_MARK, APLOG_CRIT, stat, p, APLOGNO()
+                              "make_sock: for address %pI, setsockopt: "
+                              "(SO_REUSEPORT)", server->bind_addr);
+                apr_socket_close(s);
+                return stat;
+            }
+        }
+#endif
+
 #if APR_HAVE_IPV6
         if (server->bind_addr->family == APR_INET6) {
             stat = apr_socket_opt_set(s, APR_IPV6_V6ONLY, v6only_setting);
@@ -727,6 +747,34 @@ AP_DECLARE(int) ap_setup_listeners(server_rec *s)
     return num_listeners;
 }

+AP_DECLARE(apr_status_t) ap_reopen_listeners(apr_pool_t *p, int num)
+{
+#if HAVE_SO_REUSEPORT
+    ap_listen_rec *lr;
+
+    for (lr = ap_listeners; lr && num; lr = lr->next, --num) {
+        apr_socket_t *sd = lr->sd, *s;
+
+        rv = apr_socket_create(&s, lr->bind_addr->family, SOCK_STREAM, 0, p);
+        if (rv != APR_SUCCESS) {
+            return rv;
+        }
+        lr->sd = s;
+
+        rv = make_sock(p, lr, 1);
+        if (rv != APR_SUCCESS) {
+            /* make_sock() has closed lr->sd,
+             * restore the original socket */
+            lr->sd = sd;
+            return rv;
+        }
+        /* not needed anymore */
+        apr_socket_close(sd);
+    }
+#endif
+    return APR_SUCCESS;
+}
+
 AP_DECLARE_NONSTD(void) ap_close_listeners(void)
 {
     ap_listen_rec *lr;
Index: include/ap_listen.h
===================================================================
--- include/ap_listen.h    (revision 1575322)
+++ include/ap_listen.h    (working copy)
@@ -92,6 +92,15 @@ AP_DECLARE(void) ap_listen_pre_config(void);
 AP_DECLARE(int) ap_setup_listeners(server_rec *s);

 /**
+ * Loop through the global ap_listen_rec list and reopen up to @a num sockets
+ * when applicable (eg. HAVE_SO_REUSEPORT to reuse ports).
+ * @param p The pool reopen the sockets on
+ * @param num The number of sockets to reopen (all if negative)
+ * @return APR_SUCCESS or an APR error
+ */
+AP_DECLARE(apr_status_t) ap_reopen_listeners(apr_pool_t *p, int num);
+
+/**
  * Loop through the global ap_listen_rec list and close each of the sockets.
  */
 AP_DECLARE_NONSTD(void) ap_close_listeners(void);
[EOS]

Reply via email to