[
https://issues.apache.org/jira/browse/THRIFT-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bryan Duxbury closed THRIFT-1493.
---------------------------------
Resolution: Not A Problem
> Possible infinite loop in TThreadPoolServer
> -------------------------------------------
>
> Key: THRIFT-1493
> URL: https://issues.apache.org/jira/browse/THRIFT-1493
> Project: Thrift
> Issue Type: Bug
> Components: Java - Library
> Affects Versions: 0.7
> Environment: Debian Squeeze
> Reporter: bert Passek
>
> I just faced a major problem in Thrift in combination with Flume, but the
> problem actually could be tracked down to the Thrift library.
> I'm using Thrift in a typical client/server environment for tracking tons of
> data. We ran into an exception which basically looks like:
> 2012-01-11 14:57:30,487 ERROR com.cloudera.flume.core.connector.DirectDriver:
> Exiting driver logicalNode newsletterImpressionLog01-21 in error state
> ThriftEventSource | CassandraSink because sleep interrupted
> 2012-01-11 17:18:14,808 WARN org.apache.thrift.server.TSaneThreadPoolServer:
> Transport error occurred during acceptance of message.
> org.apache.thrift.transport.TTransportException: java.net.SocketException:
> Too many open files
> at
> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139)
>
> at
> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
> at
> org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175)
>
> Caused by: java.net.SocketException: Too many open files
> at java.net.PlainSocketImpl.socketAccept(Native Method)
> at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
> at java.net.ServerSocket.implAccept(ServerSocket.java:462)
> at java.net.ServerSocket.accept(ServerSocket.java:430)
> at
> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134)
>
> ... 2 more
> 2012-01-11 17:18:14,809 WARN org.apache.thrift.server.TSaneThreadPoolServer:
> Transport error occurred during acceptance of message.
> org.apache.thrift.transport.TTransportException: java.net.SocketException:
> Too many open files
> at
> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139)
>
> at
> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
> at
> org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175)
>
> Caused by: java.net.SocketException: Too many open files
> at java.net.PlainSocketImpl.socketAccept(Native Method)
> at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
> at java.net.ServerSocket.implAccept(ServerSocket.java:462)
> at java.net.ServerSocket.accept(ServerSocket.java:430)
> at
> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134)
>
> ... 2 more
> Note: Flume is using their own implementation of TThreadPoolServer which is
> literally copied and pasted from original source code from Thrift. Flume
> embedded this part of thrift library in a massive multi-threading environment.
> I was running out of socket connection indicated by exception "too many open
> files". This exception causes an infinite loop in this part of method serve():
> while (!stopped_) {
> int failureCount = 0;
> try {
> TTransport client = serverTransport_.accept();
> WorkerProcess wp = new WorkerProcess(client);
> executorService_.execute(wp);
> } catch (TTransportException ttx) {
> if (!stopped_) {
> ++failureCount;
> LOGGER.warn("Transport error occurred during acceptance of
> message.", ttx);
> }
> }
> }
> Furthermore in an overnight process i was running out of disk space because
> the logged exceptions were increasing the size of the log file dramatically.
> There was no way of recovery.
> If there are any critical exceptions the while-loop will never be stopped.
> This can only be done by calling stop() method.
> The question is how to handle such exceptions as described above in general?
> I can't even catch an exception because the exception is just logged but not
> handled in any way. So there is no way of reacting for doing some cleanup or
> restarting the server for example.
> Best Regards
> Bert Passek
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira