Source/sink.close() is not invoked by DirectDriver.
---------------------------------------------------

                 Key: FLUME-808
                 URL: https://issues.apache.org/jira/browse/FLUME-808
             Project: Flume
          Issue Type: Bug
          Components: Node, Sinks+Sources
    Affects Versions: v0.9.4
            Reporter: Mingjie Lai


I just noticed an issue that DirectDriver doesn't really call the 
sink/source.close() at all, if flume node gets reconfigured. 

Very easy to reproduce: 
1. start a master, and node at the same machine
2. configure the node thru master, with any configure, e.g.: tail("/tmp/aaa"), 
null
3. reconfigure the node thru master: collectorSource(12345), null

The source.close() didn't get called at all for reconfigure, or rolled. And 
later on, Driver will decide to interrupt the thread since it seems not be able 
to be closed gracefully. 

{code}
2011-10-19 22:46:03,646 [Heartbeat] WARN agent.LivenessManager: Heartbeats are 
backing up, currently behind by 1 heartbeats
2011-10-19 22:46:08,648 [Heartbeat] WARN agent.LivenessManager: Heartbeats are 
backing up, currently behind by 2 heartbeats
2011-10-19 22:46:13,650 [Heartbeat] WARN agent.LivenessManager: Heartbeats are 
backing up, currently behind by 3 heartbeats
2011-10-19 22:46:18,652 [Heartbeat] WARN agent.LivenessManager: Heartbeats are 
backing up, currently behind by 4 heartbeats
2011-10-19 22:46:23,648 [Check config] ERROR agent.LogicalNode: Forcing driver 
to exit uncleanly
2011-10-19 22:46:23,648 [logicalNode c1-19] ERROR connector.DirectDriver: 
Closing down due to exception during append calls
2011-10-19 22:46:23,648 [Check config] INFO agent.LogicalNode: Node config 
successfully set to com.cloudera.flume.conf.FlumeConfigData@39a2f02e
2011-10-19 22:46:23,648 [logicalNode c1-19] INFO connector.DirectDriver: 
Connector logicalNode c1-19 exited with error: Waiting for queue element was 
interrupted! null
{code}

It may be brought by Flume-596 which is a big refactor of Driver. I tried to 
git-reset to 21b74010c34cef9a977c75ab5dec4dc747d8f5aa, and cannot reproduce the 
problem. 

Expected result -- source should be close()'ed
{code}
2011-10-19 23:20:45,811 [SpawningLogicalNode c1] INFO 
collector.CollectorSource: closed
2011-10-19 23:20:45,812 [SpawningLogicalNode c1] INFO thrift.ThriftEventSource: 
Closed server on port 35853...
2011-10-19 23:20:45,817 [SpawningLogicalNode c1] INFO thrift.ThriftEventSource: 
Queue still has 0 elements ...
2011-10-19 23:20:45,852 [logicalNode c1-20] INFO collector.CollectorSource: 
closed
2011-10-19 23:20:45,852 [logicalNode c1-20] INFO thrift.ThriftEventSource: 
Closed server on port 35853...
2011-10-19 23:20:45,853 [logicalNode c1-20] INFO thrift.ThriftEventSource: 
Queue still has 0 elements ...
2011-10-19 23:20:45,853 [logicalNode c1-20] INFO collector.CollectorSource: 
closed
2011-10-19 23:20:45,853 [logicalNode c1-20] INFO thrift.ThriftEventSource: 
Closed server on port 35853...
2011-10-19 23:20:45,853 [logicalNode c1-20] INFO thrift.ThriftEventSource: 
Queue still has 0 elements ...
2011-10-19 23:20:45,853 [logicalNode c1-20] INFO agent.LogicalNode: c1: 
Connector stopped: CollectorSource | NullSink
2011-10-19 23:20:45,853 [SpawningLogicalNode c1] INFO agent.LogicalNode: Node 
config successfully set to com.cloudera.flume.conf.FlumeConfigData@2682d210
2011-10-19 23:20:45,864 [logicalNode c1-23] INFO agent.LogicalNode: Connector 
started: TailSource | NullSink

{code}

It might be the root cause of FLUME-798, and related to all the recent 
Interrupted exception discussion on user@.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to