[ 
https://issues.apache.org/jira/browse/FLUME-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058287#comment-16058287
 ] 

Attila Simon commented on FLUME-2905:
-------------------------------------

Hi [~sahuja],
Junit passed for me but hasn't run exhaustively yet. I checked the 
FLUME-2905-4.patch and would have some minor suggestions. Would you mind 
uploading to https://reviews.apache.org/ for a code review? Please let me know 
if you have trouble with it (I'm also happy to upload it there for you). 

In a nutshell:
* I would recommend calling stop() after writing out the exception. 
* I would recommend removing the return statement from that part of the stop 
function which catches an Exception from Socket close. That currently prevents 
the rest of the stop() to be executed including super.stop() which then can't 
set the Lifecycle state.
* Test currently checks that stop was executed properly expecting the source to 
be in LifecycleState.STOP state. I think that is fine since it is internal to 
the NetcatSource that it opens a ServerSocket. What we have to check whether 
the Source is stopped on error. And this is indeed checked by the test. 

As a side note: It seems like that startSource() doesn't behave as expected 
(should pick a new port on error). This is a separate issue so I guess it 
should be fixed with a separate jira/PR.

> NetcatSource - Socket not closed when an exception is encountered during 
> start() leading to file descriptor leaks
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-2905
>                 URL: https://issues.apache.org/jira/browse/FLUME-2905
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: 1.6.0
>            Reporter: Siddharth Ahuja
>            Assignee: Siddharth Ahuja
>         Attachments: FLUME-2905-0.patch, FLUME-2905-1.patch, 
> FLUME-2905-2.patch, FLUME-2905-3.patch, FLUME-2905-4.patch
>
>
> During the flume agent start-up, the flume configuration containing the 
> NetcatSource is parsed and the source's start() is called. If there is an 
> issue while binding the channel's socket to a local address to configure the 
> socket to listen for connections following exception is thrown but the socket 
> open just before is not closed. 
> {code}
> 2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: 
> Unable to start EventDrivenSourceRunner: { 
> source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - 
> Exception follows.
> org.apache.flume.FlumeException: java.net.BindException: Address already in 
> use
>         at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173)
>         at 
> org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
>         at 
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.BindException: Address already in use
>         at sun.nio.ch.Net.bind0(Native Method)
>         at sun.nio.ch.Net.bind(Net.java:444)
>         at sun.nio.ch.Net.bind(Net.java:436)
>         at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
>         at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167)
>         ... 9 more
> {code}
> The source's start() is then called again leading to another socket being 
> opened but not closed and so on. This leads to file descriptor (socket) leaks.
> This can be easily reproduced as follows:
> 1. Set Netcat as the source in flume agent configuration.
> 2. Set the bind port for the netcat source to a port which is already in use. 
> e.g. in my case I used 50010 which is the port for DataNode's XCeiver 
> Protocol in use by the HDFS service.
> 3. Start flume agent and perform "lsof -p <flume_process_id> | wc -l". Notice 
> the file descriptors keep on growing due to socket leaks with errors like: 
> "can't identify protocol".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to