[ 
https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275659#comment-15275659
 ] 

Siddharth Ahuja commented on FLUME-2905:
----------------------------------------

Attached the patch that stops the source and cleans up the socket when a 
BindException is encountered if a port is already in use by invoking stop() .
Also, moved the creation of the thread pool near to where it gets used so that 
it does not have to be cleaned up during the stop().
JUnit is also provided.

Tested the patch as follows:
        • Started HDFS (datanode) service that binds to the port 50010, so port 
is in use.
        • Started updated flume-agent with configuration containing Netcat 
source and File roll sink with bind port as 50010.
        • Investigated flume logs and found the BindException due to port 
already in use.
        • Ran "ps auxx|grep -i flume" to get the flume process id.
        • Ran "lsof -p <flume_proc_id> | wc -l" multiple times to check if file 
descriptors are increasing. They are stable.
        • Stopped HDFS service to free up port 50010.
        • Noticed from flume-agent logs that source is finally started 
successfully with socket bound to 50010.
        • From a new terminal, ran "nc localhost 50010" and entered some text.
        • The text is written on the local filesystem successfully.


> NetcatSource - Socket not closed when an exception is encountered during 
> start() leading to file descriptor leaks
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-2905
>                 URL: https://issues.apache.org/jira/browse/FLUME-2905
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.6.0
>            Reporter: Siddharth Ahuja
>            Assignee: Siddharth Ahuja
>         Attachments: FLUME-2905-0.patch
>
>
> During the flume agent start-up, the flume configuration containing the 
> NetcatSource is parsed and the source's start() is called. If there is an 
> issue while binding the channel's socket to a local address to configure the 
> socket to listen for connections following exception is thrown but the socket 
> open just before is not closed. 
> {code}
> 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: 
> Unable to start EventDrivenSourceRunner: { 
> source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - 
> Exception follows.
> org.apache.flume.FlumeException: java.net.BindException: Address already in 
> use
>         at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173)
>         at 
> org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
>         at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.BindException: Address already in use
>         at sun.nio.ch.Net.bind0(Native Method)
>         at sun.nio.ch.Net.bind(Net.java:444)
>         at sun.nio.ch.Net.bind(Net.java:436)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
>         at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167)
>         ... 9 more
> {code}
> The source's start() is then called again leading to another socket being 
> opened but not closed and so on. This leads to file descriptor (socket) leaks.
> This can be easily reproduced as follows:
> 1. Set Netcat as the source in flume agent configuration.
> 2. Set the bind port for the netcat source to a port which is already in use. 
> e.g. in my case I used 50010 which is the port for DataNode's XCeiver 
> Protocol in use by the HDFS service.
> 3. Start flume agent and perform "lsof -p <flume_process_id> | wc -l". Notice 
> the file descriptors keep on growing due to socket leaks with errors like: 
> "can't identify protocol".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to