My bad.  Too long away from sockets since cleaning up those shutdown handlers.  
Your point is well taken, on the server side the risks of consuming a stray 
echo packet are fairly low (but non-zero, if you’ve ever spent any quality time 
with tcpdump/wireshark).

Still, in a production setting, SIGKILL (aka “kill -9”) should be a last resort 
after more reasonable methods (e.g. SIGINT, SIGTERM, SIGSTOP) have failed.

From: Ramkumar R. Aiyengar [mailto:andyetitmo...@gmail.com]
Sent: Monday, March 02, 2015 7:00 PM
To: dev@lucene.apache.org
Subject: RE: reuseAddress default in Solr jetty.xml


No, reuseAddress doesn't allow you to have two processes, old and new, listen 
to the same port. There's no option which allows you to do that.

Tl;DR This can happen when you have a connection to a server which gets killed 
hard and comes back up immediately

So here's what happens.

When a server normally shuts down, it triggers an active close on all open TCP 
connections it has. That sends a three way msg exchange with the remote 
recipient (FIN, FIN+ACK, ACK) at the end of which the socket is closed and the 
kernel puts it in a TIME_WAIT state for a few minutes in the background 
(depends on the OS, maximum tends to be 4 mins). This is needed to allow for 
reordered older packets to reach the machine just in case. Now typically if the 
server restarts within that period and tries to bind again to the same port, 
the kernel is smart enough to not complain that there is an existing socket in 
TIME_WAIT, because it knows the last sequence number it used for the final 
message in the previous process, and since sequence numbers are always 
increasing, it can reject any messages before that sequence number as a new 
process has now taken the port.

Trouble is with abnormal shutdown. There's no time for a proper goodbye, so the 
kernel marks the socket to respond to remote packets with a rude RST (reset). 
Since there has been no goodbye with the remote end, it also doesn't know the 
last sequence number to delineate if a new process binds to the same port. 
Hence by default it denies binding to the new port for the TIME_WAIT period to 
avoid the off chance a stray packet gets picked up by the new process and 
utterly confuses it. By setting reuseAddress, you are essentially waiving off 
this protection. Note that this possibility of confusion is unbelievably 
miniscule in the first place (both the source and destination host:port should 
be the same and the client port is generally randomly allocated). If the port 
we are talking of is a local port, it's almost impossible -- you have bigger 
problems if a TCP packet is lost or delayed within the same machine!

As to Shawn's point, for Solr's stop port, you essentially need to be trying to 
actively shutdown the server using the stop port, or be within a few minutes of 
such an attempt while the server is killed. Just the server being killed 
without any active connection to it is not going to cause this issue.
Hi Ram,

It appears the problem is that the old solr/jetty process is actually still 
running when the new solr/jetty process is started.   That’s the problem that 
needs fixing.

This is not a rare problem in systems with worker threads dedicated to 
different tasks.   These threads need to wake up in response to the shutdown 
signal/command, as well the normal inputs.

It’s a bug I’ve created and fixed a couple times over the years … :-)    I 
wouldn’t know where to start with Solr.  But, as I say, re-using the port is a 
band-aid.  I’ve yet to see a case where it is the best solution.

best,
Charlie

From: Ramkumar R. Aiyengar 
[mailto:andyetitmo...@gmail.com<mailto:andyetitmo...@gmail.com>]
Sent: Saturday, February 28, 2015 8:15 PM
To: dev@lucene.apache.org<mailto:dev@lucene.apache.org>
Subject: Re: reuseAddress default in Solr jetty.xml

Hey Charles, see my explanation above on why this is needed. If Solr has to be 
killed, it would generally be immediately restarted. This would normally not 
the case, except when things are potentially misconfigured or if there is a 
bug, but not doing so makes the impact worse..
In any case, turns out really that reuseAddress is true by default for the 
connectors we use, so that really isn't the issue. The issue more specifically 
is that the stop port doesn't do it, so the actual port by itself starts just 
fine on a restart, but the stop port fails to bind -- and there's no way 
currently in Jetty to configure that.
Based on my question in the jetty mailing list, I have now created an issue for 
them..

https://bugs.eclipse.org/bugs/show_bug.cgi?id=461133

On Fri, Feb 27, 2015 at 3:03 PM, Reitzel, Charles 
<charles.reit...@tiaa-cref.org<mailto:charles.reit...@tiaa-cref.org>> wrote:
Disclaimer: I’m not a Solr committer.  But, as a developer, I’ve never seen a 
good case for reusing the listening port.   Better to find and fix the root 
cause on the zombie state (or just slow shutdown, sometimes) and release the 
port.

From: Mark Miller [mailto:markrmil...@gmail.com<mailto:markrmil...@gmail.com>]
Sent: Thursday, February 26, 2015 5:28 PM
To: dev@lucene.apache.org<mailto:dev@lucene.apache.org>
Subject: Re: reuseAddress default in Solr jetty.xml

+1

- Mark

On Thu, Feb 26, 2015 at 1:54 PM Ramkumar R. Aiyengar 
<andyetitmo...@gmail.com<mailto:andyetitmo...@gmail.com>> wrote:
The jetty.xml we currently ship by default doesn't set reuseAddress=true. If 
you are having a bad GC day with things going OOM and resulting in Solr not 
even being able to shutdown cleanly (or the oom_solr.sh script killing it), 
whatever external service management mechanism you have is probably going to 
try respawn it and fail with the default config because the ports will be in 
TIME_WAIT. I guess there's the usual disclaimer with reuseAddress causing stray 
packets to reach the restarted server, but sounds like at least the default 
should be true..

I can raise a JIRA, but just wanted to check if anyone has any opinions either 
way..


*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*************************************************************************



--
Not sent from my iPhone or my Blackberry or anyone else's

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*************************************************************************

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*************************************************************************

Reply via email to