Hello! (Sorry again for late reply. See below for comments.)
On Fri, Aug 02, 2013 at 01:16:53PM +0800, Sepherosa Ziehau wrote: > Here is another round of SO_REUSEPORT support. The plot is changed a > little bit to allow smooth configure reloading and binary upgrading. > Here is what happens when so_reuseport is enable (this does not affect > single process model): > - Master creates the listen sockets w/ SO_REUSEPORT, but does not configure > them > - The first worker process will inherit the listen sockets created by > master and configure them > - After master forked the first worker process all listen sockets are closed > - The rest of the workers will create their own listen sockets w/ SO_REUSEPORT > - During binary upgrade, listen sockets are no longer passed through > environment variables, since new master will create its own listen > sockets. Well, the old master actually does not have any listen > sockets opened :). > > The idea behind this plot is that at any given time, there is always > one listen socket left, which could inherit the syncaches and pending > sockets on the to-be-closed listen sockets. The inheritance itself is > handled by the kernel; I implemented this inheritance for DragonFlyBSD > recently > (http://gitweb.dragonflybsd.org/dragonfly.git/commit/02ad2f0b874fb0a45eb69750219f79f5e8982272). > I am not tracking Linux's code, but I think Linux side will > eventually get (or already got) the proper fix. > > The patch itself: > http://leaf.dragonflybsd.org/~sephe/ngx_soreuseport3.diff > > Configuration reloading and binary upgrading will not be interfered as > w/ the first 2 patches. > > Binary upgrading reverting method 1 ("Send the HUP signal to the old > master process. ...") will not be interfered as w/ the first 2 > patches. There still could be some glitch (but not that worse as w/ > the first 2 patches) if binary upgrading reverting method 2 ("Send the > TERM signal to the new master process. ...") is used. I think we > probably just need to mention that in the document. While this look like better that what was with previous patches (mostly due to inheritance handled by kernel), it still looks very fragile for me. In particular, I really dislike the trick with making first worker process special. It's probably should either left in the state "nothing is guaranteed" (with some understanding of what will happen in various common situations like reconfiguration, upgrade, switching so_reuseport on/off) or some way should be found to make things less tricky. Additional question to consider is what happens with security checks? Linux seems to require processs user id match on SO_REUSEPORT sockets, and I would expect this to fail if there are sockets opened both in master and in worker processes; and privileged port checks might cause problems as well. (We've also discussed this here in office serveral times, and it seems that general consensus is that SO_REUSEPORT for TCP balancing isn't really good interface. It would be much easier for everyone if normal workflow with inherited listen socket descriptors just worked. Especially given the fact that in nginx case it's mostly about benchmarking, since in real life load distribution between worker processes is good enough.) -- Maxim Dounin http://nginx.org/en/donation.html _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx-devel