[ https://issues.apache.org/jira/browse/FLUME-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Paliwal resolved FLUME-808. ---------------------------------- Resolution: Won't Fix Fix Version/s: v0.9.5 Won't fix. 0.X branch not maintained anymore > Source/sink.close() is not invoked by DirectDriver if source.next() is > blocking. > -------------------------------------------------------------------------------- > > Key: FLUME-808 > URL: https://issues.apache.org/jira/browse/FLUME-808 > Project: Flume > Issue Type: Bug > Components: Node, Sinks+Sources > Affects Versions: v0.9.4 > Reporter: Mingjie Lai > Fix For: v0.9.5 > > > I just noticed an issue that DirectDriver doesn't really call the > sink/source.close() at all, if flume node gets reconfigured. > Very easy to reproduce: > 1. start a master, and node at the same machine > 2. configure the node thru master, with any configure, e.g.: > tail("/tmp/aaa"), null > 3. reconfigure the node thru master: collectorSource(12345), null > The source.close() didn't get called at all for reconfigure, or rolled. And > later on, Driver will decide to interrupt the thread since it seems not be > able to be closed gracefully. > {code} > 2011-10-19 22:46:03,646 [Heartbeat] WARN agent.LivenessManager: Heartbeats > are backing up, currently behind by 1 heartbeats > 2011-10-19 22:46:08,648 [Heartbeat] WARN agent.LivenessManager: Heartbeats > are backing up, currently behind by 2 heartbeats > 2011-10-19 22:46:13,650 [Heartbeat] WARN agent.LivenessManager: Heartbeats > are backing up, currently behind by 3 heartbeats > 2011-10-19 22:46:18,652 [Heartbeat] WARN agent.LivenessManager: Heartbeats > are backing up, currently behind by 4 heartbeats > 2011-10-19 22:46:23,648 [Check config] ERROR agent.LogicalNode: Forcing > driver to exit uncleanly > 2011-10-19 22:46:23,648 [logicalNode c1-19] ERROR connector.DirectDriver: > Closing down due to exception during append calls > 2011-10-19 22:46:23,648 [Check config] INFO agent.LogicalNode: Node config > successfully set to com.cloudera.flume.conf.FlumeConfigData@39a2f02e > 2011-10-19 22:46:23,648 [logicalNode c1-19] INFO connector.DirectDriver: > Connector logicalNode c1-19 exited with error: Waiting for queue element was > interrupted! null > {code} > It may be brought by Flume-596 which is a big refactor of Driver. I tried to > git-reset to 21b74010c34cef9a977c75ab5dec4dc747d8f5aa, and cannot reproduce > the problem. > Expected result -- source should be close()'ed > {code} > 2011-10-19 23:20:45,811 [SpawningLogicalNode c1] INFO > collector.CollectorSource: closed > 2011-10-19 23:20:45,812 [SpawningLogicalNode c1] INFO > thrift.ThriftEventSource: Closed server on port 35853... > 2011-10-19 23:20:45,817 [SpawningLogicalNode c1] INFO > thrift.ThriftEventSource: Queue still has 0 elements ... > 2011-10-19 23:20:45,852 [logicalNode c1-20] INFO collector.CollectorSource: > closed > 2011-10-19 23:20:45,852 [logicalNode c1-20] INFO thrift.ThriftEventSource: > Closed server on port 35853... > 2011-10-19 23:20:45,853 [logicalNode c1-20] INFO thrift.ThriftEventSource: > Queue still has 0 elements ... > 2011-10-19 23:20:45,853 [logicalNode c1-20] INFO collector.CollectorSource: > closed > 2011-10-19 23:20:45,853 [logicalNode c1-20] INFO thrift.ThriftEventSource: > Closed server on port 35853... > 2011-10-19 23:20:45,853 [logicalNode c1-20] INFO thrift.ThriftEventSource: > Queue still has 0 elements ... > 2011-10-19 23:20:45,853 [logicalNode c1-20] INFO agent.LogicalNode: c1: > Connector stopped: CollectorSource | NullSink > 2011-10-19 23:20:45,853 [SpawningLogicalNode c1] INFO agent.LogicalNode: Node > config successfully set to com.cloudera.flume.conf.FlumeConfigData@2682d210 > 2011-10-19 23:20:45,864 [logicalNode c1-23] INFO agent.LogicalNode: Connector > started: TailSource | NullSink > {code} > It might be the root cause of FLUME-798, and related to all the recent > Interrupted exception discussion on user@. -- This message was sent by Atlassian JIRA (v6.3.4#6332)