Hi, # cat /etc/security/limits.conf flume soft nofile 5000 flume hard nofile 5000
# cat /etc/sysctl.conf fs.file-max=200000 can you try that settings? Max open files 1024 is a default value and designed for small servers / PC. - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Jan 26, 2012, at 6:04 PM, Frank Grimes wrote: > It's 1024, but we really shouldn't need to up that value... doing so would > just delay the failure. > > > On 2012-01-26, at 11:57 AM, Zijad Purkovic wrote: > >> Hi Frank, >> >> Can you show output of ulimit -n from your collector node? >> >> On Thu, Jan 26, 2012 at 4:51 PM, Frank Grimes <frankgrime...@yahoo.com> >> wrote: >>> Hi All, >>> >>> We are using flume-0.9.5 >>> (specifically, >>> http://svn.apache.org/repos/asf/incubator/flume/trunk@1179275) >>> and occasionally our Collector node accumulates too many open TCP >>> connections and starts madly logging the following errors: >>> >>> WARN org.apache.thrift.server.TSaneThreadPoolServer: Transport error >>> occurred during acceptance of message. >>> org.apache.thrift.transport.TTransportException: java.net.SocketException: >>> Too many open files >>> at >>> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:139) >>> at >>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) >>> at >>> org.apache.thrift.server.TSaneThreadPoolServer$1.run(TSaneThreadPoolServer.java:175) >>> Caused by: java.net.SocketException: Too many open files >>> at java.net.PlainSocketImpl.socketAccept(Native Method) >>> at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) >>> at java.net.ServerSocket.implAccept(ServerSocket.java:462) >>> at java.net.ServerSocket.accept(ServerSocket.java:430) >>> at >>> org.apache.thrift.transport.TSaneServerSocket.acceptImpl(TSaneServerSocket.java:134) >>> ... 2 more >>> >>> >>> This quickly fills up the disk as the log file grows to multiple gigabytes >>> in size. >>> >>> After some investigation, it appears that even though the Agent nodes show >>> single open connections to the Collector, the Collector node appears to have >>> a bunch of zombie TCP connections open back to the Agent nodes. >>> i.e. >>> "lsof -n | grep PORT" on the Agent node shows 1 established connection >>> However, the Collector node shows hundreds of established connections for >>> that same port which don't seem to tie up to any connections I can find on >>> the Agent node. >>> >>> So we're concluding that the Collector node is somehow leaking connections. >>> >>> Has anyone seen this kind of thing before? >>> >>> Could this be related to https://issues.apache.org/jira/browse/FLUME-857? >>> Or could this be a Thrift bug that could be avoided by switching to Avro >>> sources/sinks? >>> >>> Any hints/tips are most welcome. >>> >>> Thanks, >>> >>> Frank Grimes >> >> >> >> -- >> Zijad Purković >> Dobrovoljnih davalaca krvi 3/19, Zavidovići >> 061/ 690 - 241 >