[ 
https://issues.apache.org/jira/browse/SOLR-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878602#action_12878602
 ] 

Karl Wright commented on SOLR-1951:
-----------------------------------

A site I found talks about this problem and potential solutions:

>>>>>>
First of all, are the TIME_WAITs client-side or server-side? If server-side, 
then you
need to redesign your protocol so that your clients initiate the active close 
of the
connection, whenever possible... (Except for the server occassionally booting
idle/hostile clients, etc...) Generally, a server will be handling clients from 
many
different machines, so it's far better to spread out the TIME_WAIT load among 
the
many clients, than it is to make the server bear the full load of them all...

If they're client side, it sounds like you just have a single client, then? 
And, it's making
a whole bunch of repeated one-shot connections to the server(s)? If so, then you
need to redesign your protocol to add a persistent mode of some kind, so your 
client
can just reuse a single connection to the server for handling multiple 
requests, without
needing to open a whole new connection for each one... You'll find your 
performance
will improve greatly as well, since the set-up/tear-down overhead for TCP is now
adding up to a great deal of your processing, in your current scheme...

However, if you persist in truly wanting to get around TIME_WAIT (and, I think 
it's a
horribly BAD idea to try to do so, and don't recommend ever doing it), then 
what you
want is to set "l_linger" to 0... That will force a RST of the TCP connection, 
thereby
bypassing the normal shutdown procedure, and never entering TIME_WAIT... But,
honestly, DON'T DO THIS! Even if you THINK you know WTF you're doing! It's
just not a good idea, ever... You risk data loss (because your close() of the 
socket
will now just throw away outstanding data, instead of making sure it's sent), 
you risk
corruption of future connections (due to reuse of ephemeral ports that would 
otherwise
be held in TIME_WAIT, if a wandering dup packet happens to show up, or 
something),
and you break a fundamental feature of TCP that's put there for a very good 
reason...
All to work around a poorly designed app-level protocol... But, anyway, with 
that
said, here's the FAQ page on SO_LINGER... 
<<<<<<

So, if this can be taken at face value, it would seem to argue that the massive 
numbers of TIME_WAITs are the result of every document post opening and closing 
the socket connection to the server, and that the best solution is to keep the 
socket connection alive for multiple requests. Under http, and jetty, it's not 
clear yet whether it's possible to achieve that goal.  But a little research 
should help.

If that doesn't work out, the SO_LINGER = 0 may well do the trick, but I think 
that might require a change to jetty.


> extractingUpdateHandler doesn't close socket handles promptly, and indexing 
> load tests eventually run out of resources
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1951
>                 URL: https://issues.apache.org/jira/browse/SOLR-1951
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 1.4.1, 1.5
>         Environment: sun java
> solr 1.5 build based on trunk
> debian linux "lenny"
>            Reporter: Karl Wright
>         Attachments: solr-1951.zip
>
>
> When multiple threads pound on extractingUpdateRequestHandler using multipart 
> form posting over an extended period of time, I'm seeing a huge number of 
> sockets piling up in the following state:
> tcp6       0      0 127.0.0.1:8983          127.0.0.1:44058         TIME_WAIT
> Despite the fact that the client can only have 10 sockets open at a time, 
> huge numbers of sockets accumulate that are in this state:
> r...@duck6:~# netstat -an | fgrep :8983 | wc
>   28223  169338 2257840
> r...@duck6:~#
> The sheer number of sockets lying around seems to eventually cause 
> commons-fileupload to fail (silently - another bug) in creating a temporary 
> file to contain the content data.  This causes Solr to erroneously return a 
> 400 code with "missing_content_data" or some such to the indexing poster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to