[
https://issues.apache.org/jira/browse/FLUME-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashish Paliwal resolved FLUME-1010.
-----------------------------------
Resolution: Won't Fix
Fix Version/s: v0.9.5
Won't fix. 0.X branch not maintained anymore
> Collector failing but status ACTIVE
> -----------------------------------
>
> Key: FLUME-1010
> URL: https://issues.apache.org/jira/browse/FLUME-1010
> Project: Flume
> Issue Type: Bug
> Components: Build, Node
> Affects Versions: v0.9.4
> Environment: Debian Squeeze, ubuntu maverick, cloudera package
> Reporter: Nicolas Cerveaux
> Labels: collector, driver
> Fix For: v0.9.5
>
>
> After a certain time I received this error, nothing was written to the log
> until I restart the collector:
> 2012-02-24 16:57:41,553 ERROR com.cloudera.flume.core.connector.DirectDriver:
> Exiting driver logicalNode hdflume01.lamp.com-19 in error state
> CollectorSource | Collector because null
> My problem is the collector server is still marked as active and all the
> agents wont failover to the next collector, because the collector itself is
> still up.
> My configurations is the following:
> collectorSource
> collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
> collectorSource
> collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
> collectorSource
> collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
> tail("/var/log/nginx/localhost.access.log")
> agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")
> tail("/var/log/nginx/localhost.access.log")
> agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")
> I think was able to reproduce this error with two virtual machine when I was
> changing the same configuration of the collector to:
> collectorSource console
> 2012-02-25 13:46:17,711 INFO com.cloudera.flume.collector.CollectorSource:
> closed
> 2012-02-25 13:46:18,712 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port
> 35853...
> 2012-02-25 13:46:18,713 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 1
> elements ...
> 2012-02-25 13:46:19,942 WARN com.cloudera.flume.agent.LivenessManager:
> Heartbeats are backing up, currently behind by 1 heartbeats
> 2012-02-25 13:46:24,944 WARN com.cloudera.flume.agent.LivenessManager:
> Heartbeats are backing up, currently behind by 2 heartbeats
> 2012-02-25 13:46:29,946 WARN com.cloudera.flume.agent.LivenessManager:
> Heartbeats are backing up, currently behind by 3 heartbeats
> 2012-02-25 13:46:34,949 WARN com.cloudera.flume.agent.LivenessManager:
> Heartbeats are backing up, currently behind by 4 heartbeats
> 2012-02-25 13:46:39,941 ERROR com.cloudera.flume.agent.LogicalNode: Forcing
> driver to exit uncleanly
> 2012-02-25 13:46:39,942 ERROR
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Unexpected interrupt of
> close sleep interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at
> com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
> at
> com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
> at
> com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
> 2012-02-25 13:46:39,943 ERROR com.cloudera.flume.core.connector.DirectDriver:
> Closing down due to exception during close calls
> 2012-02-25 13:46:39,943 INFO com.cloudera.flume.core.connector.DirectDriver:
> Connector logicalNode gfiles02.lamp.com-18 exited with error: sleep
> interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at
> com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
> at
> com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
> at
> com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
> 2012-02-25 13:46:39,944 INFO com.cloudera.flume.collector.CollectorSource:
> closed
> 2012-02-25 13:46:39,944 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port
> 35853...
> 2012-02-25 13:46:39,944 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 2
> elements ...
> 2012-02-25 13:46:39,944 INFO com.cloudera.flume.agent.LogicalNode: Node
> config successfully set to FlumeConfigData: {srcVer:'Sat Feb 25 13:46:08 PST
> 2012' snkVer:'Sat Feb 25 13:46:08 PST 2012' ts='Sat Feb 25 13:46:08 PST
> 2012' flowId:'default-flow' source:'collectorSource' sink:'console' }
> 2012-02-25 13:46:39,957 INFO com.cloudera.flume.collector.CollectorSource:
> opened
> 2012-02-25 13:46:39,988 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking
> thread pool server on port 35853...
> 2012-02-25 13:46:39,997 INFO
> com.cloudera.flume.handlers.debug.ConsoleEventSink: ConsoleEventSink( debug )
> opened
> 2012-02-25 13:46:49,978 WARN
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Close timed out due to
> no progress. Closing despite having 2 values still enqueued
> 2012-02-25 13:46:49,978 INFO com.cloudera.flume.handlers.rolling.RollSink:
> closing RollSink
> 'escapedCustomDfs("hdfs://hh01.hadoop.com:54310/logs/%Y/%m/%d/%H/","-%{rolltag}"
> )'
> 2012-02-25 13:46:49,979 WARN
> com.cloudera.flume.handlers.endtoend.AckChecksumChecker: partial acks
> abandoned: {20120225-134554606-0800.212691888462127.00000021=1330206354608}
> 2012-02-25 13:46:49,979 ERROR com.cloudera.flume.core.connector.DirectDriver:
> Exiting driver logicalNode gfiles02.lamp.com-18 in error state
> CollectorSource | Collector because sleep interrupted
> In both case the error state that the node is in error state but they really
> are in an active state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)