[ https://issues.apache.org/jira/browse/SOLR-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878602#action_12878602 ]
Karl Wright commented on SOLR-1951: ----------------------------------- A site I found talks about this problem and potential solutions: >>>>>> First of all, are the TIME_WAITs client-side or server-side? If server-side, then you need to redesign your protocol so that your clients initiate the active close of the connection, whenever possible... (Except for the server occassionally booting idle/hostile clients, etc...) Generally, a server will be handling clients from many different machines, so it's far better to spread out the TIME_WAIT load among the many clients, than it is to make the server bear the full load of them all... If they're client side, it sounds like you just have a single client, then? And, it's making a whole bunch of repeated one-shot connections to the server(s)? If so, then you need to redesign your protocol to add a persistent mode of some kind, so your client can just reuse a single connection to the server for handling multiple requests, without needing to open a whole new connection for each one... You'll find your performance will improve greatly as well, since the set-up/tear-down overhead for TCP is now adding up to a great deal of your processing, in your current scheme... However, if you persist in truly wanting to get around TIME_WAIT (and, I think it's a horribly BAD idea to try to do so, and don't recommend ever doing it), then what you want is to set "l_linger" to 0... That will force a RST of the TCP connection, thereby bypassing the normal shutdown procedure, and never entering TIME_WAIT... But, honestly, DON'T DO THIS! Even if you THINK you know WTF you're doing! It's just not a good idea, ever... You risk data loss (because your close() of the socket will now just throw away outstanding data, instead of making sure it's sent), you risk corruption of future connections (due to reuse of ephemeral ports that would otherwise be held in TIME_WAIT, if a wandering dup packet happens to show up, or something), and you break a fundamental feature of TCP that's put there for a very good reason... All to work around a poorly designed app-level protocol... But, anyway, with that said, here's the FAQ page on SO_LINGER... <<<<<< So, if this can be taken at face value, it would seem to argue that the massive numbers of TIME_WAITs are the result of every document post opening and closing the socket connection to the server, and that the best solution is to keep the socket connection alive for multiple requests. Under http, and jetty, it's not clear yet whether it's possible to achieve that goal. But a little research should help. If that doesn't work out, the SO_LINGER = 0 may well do the trick, but I think that might require a change to jetty. > extractingUpdateHandler doesn't close socket handles promptly, and indexing > load tests eventually run out of resources > ---------------------------------------------------------------------------------------------------------------------- > > Key: SOLR-1951 > URL: https://issues.apache.org/jira/browse/SOLR-1951 > Project: Solr > Issue Type: Bug > Components: update > Affects Versions: 1.4.1, 1.5 > Environment: sun java > solr 1.5 build based on trunk > debian linux "lenny" > Reporter: Karl Wright > Attachments: solr-1951.zip > > > When multiple threads pound on extractingUpdateRequestHandler using multipart > form posting over an extended period of time, I'm seeing a huge number of > sockets piling up in the following state: > tcp6 0 0 127.0.0.1:8983 127.0.0.1:44058 TIME_WAIT > Despite the fact that the client can only have 10 sockets open at a time, > huge numbers of sockets accumulate that are in this state: > r...@duck6:~# netstat -an | fgrep :8983 | wc > 28223 169338 2257840 > r...@duck6:~# > The sheer number of sockets lying around seems to eventually cause > commons-fileupload to fail (silently - another bug) in creating a temporary > file to contain the content data. This causes Solr to erroneously return a > 400 code with "missing_content_data" or some such to the indexing poster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org