Collector failing but status ACTIVE
-----------------------------------

                 Key: FLUME-1010
                 URL: https://issues.apache.org/jira/browse/FLUME-1010
             Project: Flume
          Issue Type: Bug
          Components: Build, Node
    Affects Versions: v0.9.4
         Environment: Debian Squeeze, ubuntu maverick, cloudera package
            Reporter: Nicolas Cerveaux


After a certain time I received this error, nothing was written to the log 
until I restart the collector:
2012-02-24 16:57:41,553 ERROR com.cloudera.flume.core.connector.DirectDriver: 
Exiting driver logicalNode hdflume01.lamp.com-19 in error state CollectorSource 
| Collector because null

My problem is the collector server is still marked as active and all the agents 
wont failover to the next collector, because the collector itself is still up.

My configurations is the following:
collectorSource 
collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
collectorSource 
collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
collectorSource 
collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
tail("/var/log/nginx/localhost.access.log")     
agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")
tail("/var/log/nginx/localhost.access.log")     
agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")

I think was able to reproduce this error with two virtual machine when I was 
changing the same configuration of the collector to:
collectorSource console

2012-02-25 13:46:17,711 INFO com.cloudera.flume.collector.CollectorSource: 
closed
2012-02-25 13:46:18,712 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 
35853...
2012-02-25 13:46:18,713 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 1 
elements ...
2012-02-25 13:46:19,942 WARN com.cloudera.flume.agent.LivenessManager: 
Heartbeats are backing up, currently behind by 1 heartbeats
2012-02-25 13:46:24,944 WARN com.cloudera.flume.agent.LivenessManager: 
Heartbeats are backing up, currently behind by 2 heartbeats
2012-02-25 13:46:29,946 WARN com.cloudera.flume.agent.LivenessManager: 
Heartbeats are backing up, currently behind by 3 heartbeats
2012-02-25 13:46:34,949 WARN com.cloudera.flume.agent.LivenessManager: 
Heartbeats are backing up, currently behind by 4 heartbeats
2012-02-25 13:46:39,941 ERROR com.cloudera.flume.agent.LogicalNode: Forcing 
driver to exit uncleanly
2012-02-25 13:46:39,942 ERROR 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Unexpected interrupt of 
close sleep interrupted
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at 
com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
        at 
com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
        at 
com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
2012-02-25 13:46:39,943 ERROR com.cloudera.flume.core.connector.DirectDriver: 
Closing down due to exception during close calls
2012-02-25 13:46:39,943 INFO com.cloudera.flume.core.connector.DirectDriver: 
Connector logicalNode gfiles02.lamp.com-18 exited with error: sleep interrupted
java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at 
com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
        at 
com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
        at 
com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
2012-02-25 13:46:39,944 INFO com.cloudera.flume.collector.CollectorSource: 
closed
2012-02-25 13:46:39,944 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 
35853...
2012-02-25 13:46:39,944 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 2 
elements ...
2012-02-25 13:46:39,944 INFO com.cloudera.flume.agent.LogicalNode: Node config 
successfully set to FlumeConfigData: {srcVer:'Sat Feb 25 13:46:08 PST 2012' 
snkVer:'Sat Feb 25 13:46:08 PST 2012'  ts='Sat Feb 25 13:46:08 PST 2012' 
flowId:'default-flow' source:'collectorSource' sink:'console' }
2012-02-25 13:46:39,957 INFO com.cloudera.flume.collector.CollectorSource: 
opened
2012-02-25 13:46:39,988 INFO 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread 
pool server on port 35853...
2012-02-25 13:46:39,997 INFO 
com.cloudera.flume.handlers.debug.ConsoleEventSink: ConsoleEventSink( debug ) 
opened
2012-02-25 13:46:49,978 WARN 
com.cloudera.flume.handlers.thrift.ThriftEventSource: Close timed out due to no 
progress.  Closing despite having 2 values still enqueued
2012-02-25 13:46:49,978 INFO com.cloudera.flume.handlers.rolling.RollSink: 
closing RollSink 
'escapedCustomDfs("hdfs://hh01.hadoop.com:54310/logs/%Y/%m/%d/%H/","-%{rolltag}"
 )'
2012-02-25 13:46:49,979 WARN 
com.cloudera.flume.handlers.endtoend.AckChecksumChecker: partial acks 
abandoned: {20120225-134554606-0800.212691888462127.00000021=1330206354608}
2012-02-25 13:46:49,979 ERROR com.cloudera.flume.core.connector.DirectDriver: 
Exiting driver logicalNode gfiles02.lamp.com-18 in error state CollectorSource 
| Collector because sleep interrupted


In both case the error state that the node is in error state but they really 
are in an active state.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to