Siddharth Ahuja created FLUME-2905:
--------------------------------------

             Summary: NetcatSource - Socket not closed when an exception is 
encountered during start() leading to file descriptor leaks
                 Key: FLUME-2905
                 URL: https://issues.apache.org/jira/browse/FLUME-2905
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: v1.6.0
            Reporter: Siddharth Ahuja


During the flume agent start-up, the flume configuration containing the 
NetcatSource is parsed and the source's start() is called. If there is an issue 
while binding the channel's socket to a local address to configure the socket 
to listen for connections following exception is thrown but the socket open 
just before is not closed. 

2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: 
Unable to start EventDrivenSourceRunner: { 
source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - 
Exception follows.
org.apache.flume.FlumeException: java.net.BindException: Address already in use
        at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173)
        at 
org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
        at 
org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:444)
        at sun.nio.ch.Net.bind(Net.java:436)
        at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
        at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167)
        ... 9 more

The source's start() is then called again leading to another socket being 
opened but not closed and so on. This leads to file descriptor (socket) leaks.

This can be easily reproduced as follows:
1. Set Netcat as the source in flume agent configuration.
2. Set the bind port for the netcat source to a port which is already in use. 
e.g. in my case I used 50010 which is the port for DataNode's XCeiver Protocol 
in use by the HDFS service.
3. Start flume agent and perform "lsof -p <flume_process_id> | wc -l". Notice 
the file descriptors keep on growing due to socket leaks with errors like: 
"can't identify protocol".




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to