But all too often necessary :) On Tue, Mar 3, 2015 at 12:14 AM Ramkumar R. Aiyengar < [email protected]> wrote:
> I agree, sigkill is typically the last resort.. > On 3 Mar 2015 00:49, "Reitzel, Charles" <[email protected]> > wrote: > >> My bad. Too long away from sockets since cleaning up those shutdown >> handlers. Your point is well taken, on the server side the risks of >> consuming a stray echo packet are fairly low (but non-zero, if you’ve ever >> spent any quality time with tcpdump/wireshark). >> >> >> >> Still, in a production setting, SIGKILL (aka “kill -9”) should be a last >> resort after more reasonable methods (e.g. SIGINT, SIGTERM, SIGSTOP) have >> failed. >> >> >> >> *From:* Ramkumar R. Aiyengar [mailto:[email protected]] >> *Sent:* Monday, March 02, 2015 7:00 PM >> *To:* [email protected] >> *Subject:* RE: reuseAddress default in Solr jetty.xml >> >> >> >> No, reuseAddress doesn't allow you to have two processes, old and new, >> listen to the same port. There's no option which allows you to do that. >> >> Tl;DR This can happen when you have a connection to a server which gets >> killed hard and comes back up immediately >> >> So here's what happens. >> >> When a server normally shuts down, it triggers an active close on all >> open TCP connections it has. That sends a three way msg exchange with the >> remote recipient (FIN, FIN+ACK, ACK) at the end of which the socket is >> closed and the kernel puts it in a TIME_WAIT state for a few minutes in the >> background (depends on the OS, maximum tends to be 4 mins). This is needed >> to allow for reordered older packets to reach the machine just in case. Now >> typically if the server restarts within that period and tries to bind again >> to the same port, the kernel is smart enough to not complain that there is >> an existing socket in TIME_WAIT, because it knows the last sequence number >> it used for the final message in the previous process, and since sequence >> numbers are always increasing, it can reject any messages before that >> sequence number as a new process has now taken the port. >> >> Trouble is with abnormal shutdown. There's no time for a proper goodbye, >> so the kernel marks the socket to respond to remote packets with a rude RST >> (reset). Since there has been no goodbye with the remote end, it also >> doesn't know the last sequence number to delineate if a new process binds >> to the same port. Hence by default it denies binding to the new port for >> the TIME_WAIT period to avoid the off chance a stray packet gets picked up >> by the new process and utterly confuses it. By setting reuseAddress, you >> are essentially waiving off this protection. Note that this possibility of >> confusion is unbelievably miniscule in the first place (both the source and >> destination host:port should be the same and the client port is generally >> randomly allocated). If the port we are talking of is a local port, it's >> almost impossible -- you have bigger problems if a TCP packet is lost or >> delayed within the same machine! >> >> As to Shawn's point, for Solr's stop port, you essentially need to be >> trying to actively shutdown the server using the stop port, or be within a >> few minutes of such an attempt while the server is killed. Just the server >> being killed without any active connection to it is not going to cause this >> issue. >> >> Hi Ram, >> >> >> >> It appears the problem is that the old solr/jetty process is actually >> still running when the new solr/jetty process is started. That’s the >> problem that needs fixing. >> >> >> >> This is not a rare problem in systems with worker threads dedicated to >> different tasks. These threads need to wake up in response to the >> shutdown signal/command, as well the normal inputs. >> >> >> >> It’s a bug I’ve created and fixed a couple times over the years … :-) >> I wouldn’t know where to start with Solr. But, as I say, re-using the port >> is a band-aid. I’ve yet to see a case where it is the best solution. >> >> >> >> best, >> >> Charlie >> >> >> >> *From:* Ramkumar R. Aiyengar [mailto:[email protected]] >> *Sent:* Saturday, February 28, 2015 8:15 PM >> *To:* [email protected] >> *Subject:* Re: reuseAddress default in Solr jetty.xml >> >> >> >> Hey Charles, see my explanation above on why this is needed. If Solr has >> to be killed, it would generally be immediately restarted. This would >> normally not the case, except when things are potentially misconfigured or >> if there is a bug, but not doing so makes the impact worse.. >> >> In any case, turns out really that reuseAddress is true by default for >> the connectors we use, so that really isn't the issue. The issue more >> specifically is that the stop port doesn't do it, so the actual port by >> itself starts just fine on a restart, but the stop port fails to bind -- >> and there's no way currently in Jetty to configure that. >> >> Based on my question in the jetty mailing list, I have now created an >> issue for them.. >> >> https://bugs.eclipse.org/bugs/show_bug.cgi?id=461133 >> >> >> >> On Fri, Feb 27, 2015 at 3:03 PM, Reitzel, Charles < >> [email protected]> wrote: >> >> Disclaimer: I’m not a Solr committer. But, as a developer, I’ve never >> seen a good case for reusing the listening port. Better to find and fix >> the root cause on the zombie state (or just slow shutdown, sometimes) and >> release the port. >> >> >> >> *From:* Mark Miller [mailto:[email protected]] >> *Sent:* Thursday, February 26, 2015 5:28 PM >> *To:* [email protected] >> *Subject:* Re: reuseAddress default in Solr jetty.xml >> >> >> >> +1 >> >> - Mark >> >> >> >> On Thu, Feb 26, 2015 at 1:54 PM Ramkumar R. Aiyengar < >> [email protected]> wrote: >> >> The jetty.xml we currently ship by default doesn't set reuseAddress=true. >> If you are having a bad GC day with things going OOM and resulting in Solr >> not even being able to shutdown cleanly (or the oom_solr.sh script killing >> it), whatever external service management mechanism you have is probably >> going to try respawn it and fail with the default config because the ports >> will be in TIME_WAIT. I guess there's the usual disclaimer with >> reuseAddress causing stray packets to reach the restarted server, but >> sounds like at least the default should be true.. >> >> I can raise a JIRA, but just wanted to check if anyone has any opinions >> either way.. >> >> >> >> >> ************************************************************************* >> This e-mail may contain confidential or privileged information. >> If you are not the intended recipient, please notify the sender >> immediately and then delete it. >> >> TIAA-CREF >> ************************************************************************* >> >> >> >> >> -- >> >> Not sent from my iPhone or my Blackberry or anyone else's >> >> >> ************************************************************************* >> This e-mail may contain confidential or privileged information. >> If you are not the intended recipient, please notify the sender >> immediately and then delete it. >> >> TIAA-CREF >> ************************************************************************* >> >> >> ************************************************************************* >> This e-mail may contain confidential or privileged information. >> If you are not the intended recipient, please notify the sender >> immediately and then delete it. >> >> TIAA-CREF >> ************************************************************************* >> >
