[ 
https://issues.apache.org/jira/browse/THRIFT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189911#comment-13189911
 ] 

Bryan Duxbury commented on THRIFT-1493:
---------------------------------------

Thrift has no way of knowing that the error is a fatal one in this case, since 
given enough of a decrease in request volume, there *will* be sockets available 
again. People often get bit by this the first time they put up a high-volume 
Thrift server. You should basically max out the OS file handle limit if you are 
going to have lots of connections. 

If you want to be able to react to this problem, monitor the logs externally. 
If disk space could be an issue, then change the log level or experiment with 
rotation/zipping.
                
> Possible infinite loop in TThreadPoolServer
> -------------------------------------------
>
>                 Key: THRIFT-1493
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1493
>             Project: Thrift
>          Issue Type: Bug
>          Components: Java - Library
>    Affects Versions: 0.7
>         Environment: Debian Squeeze
>            Reporter: bert Passek
>
> I just faced a major problem in Thrift in combination with Flume, but the 
> problem actually could be tracked down to the Thrift library.
> I'm using Thrift in a typical client/server environment for tracking tons of 
> data. We ran into an exception which basically looks like:
> 2012-01-11 14:57:30,487 ERROR com.cloudera.flume.core.connector.DirectDriver: 
> Exiting driver logicalNode newsletterImpressionLog01-21 in error state 
> ThriftEventSource | CassandraSink because sleep interrupted 
> 2012-01-11 17:18:14,808 WARN org.apache.thrift.server.TSaneThreadPoolServer: 
> Transport error occurred during acceptance of message. 
> org.apache.thrift.transport.TTransportException: java.net.SocketException: 
> Too many open files 
>         at 
> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139)
>  
>         at 
> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) 
>         at 
> org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175)
>  
> Caused by: java.net.SocketException: Too many open files 
>         at java.net.PlainSocketImpl.socketAccept(Native Method) 
>         at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) 
>         at java.net.ServerSocket.implAccept(ServerSocket.java:462) 
>         at java.net.ServerSocket.accept(ServerSocket.java:430) 
>         at 
> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134)
>  
>         ... 2 more 
> 2012-01-11 17:18:14,809 WARN org.apache.thrift.server.TSaneThreadPoolServer: 
> Transport error occurred during acceptance of message. 
> org.apache.thrift.transport.TTransportException: java.net.SocketException: 
> Too many open files 
>         at 
> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139)
>  
>         at 
> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) 
>         at 
> org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175)
>  
> Caused by: java.net.SocketException: Too many open files 
>         at java.net.PlainSocketImpl.socketAccept(Native Method) 
>         at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) 
>         at java.net.ServerSocket.implAccept(ServerSocket.java:462) 
>         at java.net.ServerSocket.accept(ServerSocket.java:430) 
>         at 
> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134)
>  
>         ... 2 more 
> Note: Flume is using their own implementation of TThreadPoolServer which is 
> literally copied and pasted from original source code from Thrift. Flume 
> embedded this part of thrift library in a massive multi-threading environment.
> I was running out of socket connection indicated by exception "too many open 
> files". This exception causes an infinite loop in this part of method serve():
> while (!stopped_) {
>       int failureCount = 0;
>       try {
>         TTransport client = serverTransport_.accept();
>         WorkerProcess wp = new WorkerProcess(client);
>         executorService_.execute(wp);
>       } catch (TTransportException ttx) {
>         if (!stopped_) {
>           ++failureCount;
>           LOGGER.warn("Transport error occurred during acceptance of 
> message.", ttx);
>         }
>       }
>     }
> Furthermore in an overnight process i was running out of disk space because 
> the logged exceptions were increasing the size of the log file dramatically. 
> There was no way of recovery.
> If there are any critical exceptions the while-loop will never be stopped. 
> This can only be done by calling stop() method.
> The question is how to handle such exceptions as described above in general? 
> I can't even catch an exception because the exception is just logged but not 
> handled in any way. So there is no way of reacting for doing some cleanup or 
> restarting the server for example.
> Best Regards 
> Bert Passek

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to