Possible infinite loop in TThreadPoolServer
-------------------------------------------
Key: THRIFT-1493
URL: https://issues.apache.org/jira/browse/THRIFT-1493
Project: Thrift
Issue Type: Bug
Components: Java - Library
Affects Versions: 0.7
Environment: Debian Squeeze
Reporter: bert Passek
I just faced a major problem in Thrift in combination with Flume, but the
problem actually could be tracked down to the Thrift library.
I'm using Thrift in a typical client/server environment for tracking tons of
data. We ran into an exception which basically looks like:
2012-01-11 14:57:30,487 ERROR com.cloudera.flume.core.connector.DirectDriver:
Exiting driver logicalNode newsletterImpressionLog01-21 in error state
ThriftEventSource | CassandraSink because sleep interrupted
2012-01-11 17:18:14,808 WARN org.apache.thrift.server.TSaneThreadPoolServer:
Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too
many open files
at
org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139)
at
org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at
org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175)
Caused by: java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
at java.net.ServerSocket.implAccept(ServerSocket.java:462)
at java.net.ServerSocket.accept(ServerSocket.java:430)
at
org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134)
... 2 more
2012-01-11 17:18:14,809 WARN org.apache.thrift.server.TSaneThreadPoolServer:
Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too
many open files
at
org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139)
at
org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at
org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175)
Caused by: java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
at java.net.ServerSocket.implAccept(ServerSocket.java:462)
at java.net.ServerSocket.accept(ServerSocket.java:430)
at
org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134)
... 2 more
Note: Flume is using their own implementation of TThreadPoolServer which is
literally copied and pasted from original source code from Thrift. Flume
embedded this part of thrift library in a massive multi-threading environment.
I was running out of socket connection indicated by exception "too many open
files". This exception causes an infinite loop in this part of method serve():
while (!stopped_) {
int failureCount = 0;
try {
TTransport client = serverTransport_.accept();
WorkerProcess wp = new WorkerProcess(client);
executorService_.execute(wp);
} catch (TTransportException ttx) {
if (!stopped_) {
++failureCount;
LOGGER.warn("Transport error occurred during acceptance of message.",
ttx);
}
}
}
Furthermore in an overnight process i was running out of disk space because the
logged exceptions were increasing the size of the log file dramatically. There
was no way of recovery.
If there are any critical exceptions the while-loop will never be stopped. This
can only be done by calling stop() method.
The question is how to handle such exceptions as described above in general? I
can't even catch an exception because the exception is just logged but not
handled in any way. So there is no way of reacting for doing some cleanup or
restarting the server for example.
Best Regards
Bert Passek
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira