[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

Shalin Shekhar Mangar (JIRA) Tue, 12 Jul 2016 11:52:39 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373452#comment-15373452
 ]


Shalin Shekhar Mangar commented on SOLR-9290:
---------------------------------------------

It is reproducible very easily on stock solr with SSL enabled. My test setup 
creates two SSL-enabled Solr instances with a 5 shard x 2 replica collection 
and runs a short indexing program (just 9 update requests with 1 document each 
and a commit at the end). Keep on running the indexing program repeatedly and 
the number of connections in the CLOSE_WAIT state gradually increase.

Interestingly, the number of connections stuck in CLOSE_WAIT decrease during 
indexing and increase again about 10 or so seconds after the indexing is 
stopped.

I can reproduce the problem on 6.1, 6.0, 5.5.1, 5.3.2. I am not able to 
reproduce this on master although I don't see anything relevant that has 
changed since 6.1 -- I tried this only once so it may have just been bad timing?

When the connections show in CLOSE_WAIT state, the recv-q buffer always has 
exactly 70 bytes.
{code}
netstat -tonp | grep CLOSE_WAIT | grep java
tcp       70      0 127.0.0.1:56538         127.0.1.1:8983          CLOSE_WAIT  
21654/java       off (0.00/0/0)
tcp       70      0 127.0.0.1:47995         127.0.1.1:8984          CLOSE_WAIT  
21654/java       off (0.00/0/0)
tcp       70      0 127.0.0.1:47477         127.0.1.1:8984          CLOSE_WAIT  
21654/java       off (0.00/0/0)
tcp       70      0 127.0.0.1:47996         127.0.1.1:8984          CLOSE_WAIT  
21654/java       off (0.00/0/0)
tcp       70      0 127.0.0.1:56644         127.0.1.1:8983          CLOSE_WAIT  
21654/java       off (0.00/0/0)
tcp       70      0 127.0.0.1:56533         127.0.1.1:8983          CLOSE_WAIT  
21654/java       off (0.00/0/0)
...
{code}

If I run the same steps with SSL disabled then the connections in CLOSE_WAIT 
state have just 1 byte in recv-q. I don't see the number of such connections 
increasing with indexing over time but I know for a fact (from a client) that 
eventually more and more connections pile up in this state even without SSL.
{code}
tcp       1      0 127.0.0.1:41723         127.0.1.1:8983          CLOSE_WAIT  
2522/java        off (0.00/0/0)
tcp       1      0 127.0.0.1:41780         127.0.1.1:8983          CLOSE_WAIT  
2640/java        off (0.00/0/0)
...
{code}

I enabled debug logging for PoolingHttpClientConnectionManager (used in 6.x) 
and PoolingClientConnectionManager (used in 5.x.x) and after running the 
indexing program and verifying that some connections are in CLOSE_WAIT, I 
grepped the logs for connections leased vs released and I always find the 
number to be the same which means that the connections are always given back to 
the pool.

Now some connections hanging around in CLOSE_WAIT are to be expected because of 
the following (quoted from the httpclient documentation):
{quote}
One of the major shortcomings of the classic blocking I/O model is that the 
network socket can react to I/O events only when blocked in an I/O operation. 
When a connection is released back to the manager, it can be kept alive however 
it is unable to monitor the status of the socket and react to any I/O events. 
If the connection gets closed on the server side, the client side connection is 
unable to detect the change in the connection state (and react appropriately by 
closing the socket on its end).
HttpClient tries to mitigate the problem by testing whether the connection is 
'stale', that is no longer valid because it was closed on the server side, 
prior to using the connection for executing an HTTP request. The stale 
connection check is not 100% reliable. The only feasible solution that does not 
involve a one thread per socket model for idle connections is a dedicated 
monitor thread used to evict connections that are considered expired due to a 
long period of inactivity. The monitor thread can periodically call 
ClientConnectionManager#closeExpiredConnections() method to close all expired 
connections and evict closed connections from the pool. It can also optionally 
call ClientConnectionManager#closeIdleConnections() method to close all 
connections that have been idle over a given period of time.
{quote}

I'm going to try adding such a monitor thread and see if this is still a 
problem.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-9290
>                 URL: https://issues.apache.org/jira/browse/SOLR-9290
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.5.1, 5.5.2
>            Reporter: Anshum Gupta
>            Priority: Critical
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

Reply via email to