I agree, sigkill is typically the last resort.. On 3 Mar 2015 00:49, "Reitzel, Charles" <[email protected]> wrote:
> My bad. Too long away from sockets since cleaning up those shutdown > handlers. Your point is well taken, on the server side the risks of > consuming a stray echo packet are fairly low (but non-zero, if you’ve ever > spent any quality time with tcpdump/wireshark). > > > > Still, in a production setting, SIGKILL (aka “kill -9”) should be a last > resort after more reasonable methods (e.g. SIGINT, SIGTERM, SIGSTOP) have > failed. > > > > *From:* Ramkumar R. Aiyengar [mailto:[email protected]] > *Sent:* Monday, March 02, 2015 7:00 PM > *To:* [email protected] > *Subject:* RE: reuseAddress default in Solr jetty.xml > > > > No, reuseAddress doesn't allow you to have two processes, old and new, > listen to the same port. There's no option which allows you to do that. > > Tl;DR This can happen when you have a connection to a server which gets > killed hard and comes back up immediately > > So here's what happens. > > When a server normally shuts down, it triggers an active close on all open > TCP connections it has. That sends a three way msg exchange with the remote > recipient (FIN, FIN+ACK, ACK) at the end of which the socket is closed and > the kernel puts it in a TIME_WAIT state for a few minutes in the background > (depends on the OS, maximum tends to be 4 mins). This is needed to allow > for reordered older packets to reach the machine just in case. Now > typically if the server restarts within that period and tries to bind again > to the same port, the kernel is smart enough to not complain that there is > an existing socket in TIME_WAIT, because it knows the last sequence number > it used for the final message in the previous process, and since sequence > numbers are always increasing, it can reject any messages before that > sequence number as a new process has now taken the port. > > Trouble is with abnormal shutdown. There's no time for a proper goodbye, > so the kernel marks the socket to respond to remote packets with a rude RST > (reset). Since there has been no goodbye with the remote end, it also > doesn't know the last sequence number to delineate if a new process binds > to the same port. Hence by default it denies binding to the new port for > the TIME_WAIT period to avoid the off chance a stray packet gets picked up > by the new process and utterly confuses it. By setting reuseAddress, you > are essentially waiving off this protection. Note that this possibility of > confusion is unbelievably miniscule in the first place (both the source and > destination host:port should be the same and the client port is generally > randomly allocated). If the port we are talking of is a local port, it's > almost impossible -- you have bigger problems if a TCP packet is lost or > delayed within the same machine! > > As to Shawn's point, for Solr's stop port, you essentially need to be > trying to actively shutdown the server using the stop port, or be within a > few minutes of such an attempt while the server is killed. Just the server > being killed without any active connection to it is not going to cause this > issue. > > Hi Ram, > > > > It appears the problem is that the old solr/jetty process is actually > still running when the new solr/jetty process is started. That’s the > problem that needs fixing. > > > > This is not a rare problem in systems with worker threads dedicated to > different tasks. These threads need to wake up in response to the > shutdown signal/command, as well the normal inputs. > > > > It’s a bug I’ve created and fixed a couple times over the years … :-) I > wouldn’t know where to start with Solr. But, as I say, re-using the port > is a band-aid. I’ve yet to see a case where it is the best solution. > > > > best, > > Charlie > > > > *From:* Ramkumar R. Aiyengar [mailto:[email protected]] > *Sent:* Saturday, February 28, 2015 8:15 PM > *To:* [email protected] > *Subject:* Re: reuseAddress default in Solr jetty.xml > > > > Hey Charles, see my explanation above on why this is needed. If Solr has > to be killed, it would generally be immediately restarted. This would > normally not the case, except when things are potentially misconfigured or > if there is a bug, but not doing so makes the impact worse.. > > In any case, turns out really that reuseAddress is true by default for the > connectors we use, so that really isn't the issue. The issue more > specifically is that the stop port doesn't do it, so the actual port by > itself starts just fine on a restart, but the stop port fails to bind -- > and there's no way currently in Jetty to configure that. > > Based on my question in the jetty mailing list, I have now created an > issue for them.. > > https://bugs.eclipse.org/bugs/show_bug.cgi?id=461133 > > > > On Fri, Feb 27, 2015 at 3:03 PM, Reitzel, Charles < > [email protected]> wrote: > > Disclaimer: I’m not a Solr committer. But, as a developer, I’ve never > seen a good case for reusing the listening port. Better to find and fix > the root cause on the zombie state (or just slow shutdown, sometimes) and > release the port. > > > > *From:* Mark Miller [mailto:[email protected]] > *Sent:* Thursday, February 26, 2015 5:28 PM > *To:* [email protected] > *Subject:* Re: reuseAddress default in Solr jetty.xml > > > > +1 > > - Mark > > > > On Thu, Feb 26, 2015 at 1:54 PM Ramkumar R. Aiyengar < > [email protected]> wrote: > > The jetty.xml we currently ship by default doesn't set reuseAddress=true. > If you are having a bad GC day with things going OOM and resulting in Solr > not even being able to shutdown cleanly (or the oom_solr.sh script killing > it), whatever external service management mechanism you have is probably > going to try respawn it and fail with the default config because the ports > will be in TIME_WAIT. I guess there's the usual disclaimer with > reuseAddress causing stray packets to reach the restarted server, but > sounds like at least the default should be true.. > > I can raise a JIRA, but just wanted to check if anyone has any opinions > either way.. > > > > > ************************************************************************* > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender > immediately and then delete it. > > TIAA-CREF > ************************************************************************* > > > > > -- > > Not sent from my iPhone or my Blackberry or anyone else's > > > ************************************************************************* > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender > immediately and then delete it. > > TIAA-CREF > ************************************************************************* > > > ************************************************************************* > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender > immediately and then delete it. > > TIAA-CREF > ************************************************************************* >
