[
https://issues.apache.org/jira/browse/FLUME-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashish Paliwal resolved FLUME-754.
----------------------------------
Resolution: Won't Fix
Won't fix. 0.X branch not maintained anymore
> Collector can't start correctly if NameNode is down for E2E reliability mode.
> -----------------------------------------------------------------------------
>
> Key: FLUME-754
> URL: https://issues.apache.org/jira/browse/FLUME-754
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v0.9.4
> Environment: Centos 5.5, cdh3u1, java 1.6.0_20.
> Ubuntu 10.04 x64, cdh3u1, pseudo-distributed mode, java 1.6.0_24.
> Reporter: Alexey Zotov
> Priority: Critical
> Labels: E2E, NameNode, collector
> Fix For: v0.9.5
>
>
> The collector can't start correctly if NameNode is down. The collector goes
> to ERROR state, but it can be recovered by 'refresh' command.
> There are a sample configs:
> exec config azotov-ws 'text("/tmp/test.log")'
> 'agentE2ESink("localhost",35867)'
> exec config idea-collector 'collectorSource(35867)' 'collector( 10000 )
> {escapedFormatDfs("hdfs://localhost:8020/tmp/test3/",
> "flume-%{rolltag}.log")}'
> There are a logs:
> 11/08/30 15:07:41 INFO conf.FlumeConfiguration: Loading configurations from
> /etc/flume/conf
> 11/08/30 15:07:41 WARN agent.FlumeNode: Log directory is writing inside of
> /tmp. This data may not survive reboot!
> 11/08/30 15:07:41 WARN text.FormatFactory: Unable to load output format
> plugin class - Class not found
> 11/08/30 15:07:41 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 11/08/30 15:07:41 INFO util.InternalHttpServer: Starting internal HTTP server
> 11/08/30 15:07:41 INFO mortbay.log: jetty-6.1.26
> 11/08/30 15:07:41 INFO mortbay.log: Extract
> /usr/lib/flume/webapps/flumeagent.war to
> /tmp/Jetty_0_0_0_0_35862_flumeagent.war__flumeagent__4cgrz3/webapp
> 11/08/30 15:07:42 INFO mortbay.log: Started
> [email protected]:35862
> 11/08/30 15:07:42 INFO util.InternalHttpServer: Server started on port 35862
> 11/08/30 15:07:42 INFO agent.LogicalNodeManager: creating new logical node
> idea-collector
> 11/08/30 15:07:42 INFO agent.MultiMasterRPC: No active master RPC connection
> 11/08/30 15:07:42 INFO agent.LogicalNodeManager: Loading node name with
> FlumeConfigData: {srcVer:'Thu Jan 01 03:00:00 MSK 1970' snkVer:'Thu Jan 01
> 03:00:00 MSK 1970' ts='Thu Jan 01 03:00:00 MSK 1970' flowId:'null'
> source:'null' sink:'null' }
> 11/08/30 15:07:42 INFO agent.ThriftMasterRPC: Connected to master at
> localhost:35872
> 11/08/30 15:07:42 INFO agent.LogicalNode: Node config successfully set to
> FlumeConfigData: {srcVer:'Thu Jan 01 03:00:00 MSK 1970' snkVer:'Thu Jan 01
> 03:00:00 MSK 1970' ts='Thu Jan 01 03:00:00 MSK 1970' flowId:'null'
> source:'null' sink:'null' }
> 11/08/30 15:07:42 INFO agent.FlumeNode: Hadoop Security enabled: false
> 11/08/30 15:07:47 INFO rolling.RollSink: Created RollSink:
> trigger=[TimeTrigger: maxAge=10000
> tagger=com.cloudera.flume.handlers.rolling.ProcessTagger@317cfd38]
> checkPeriodMs = 250 spec='escapedFormatDfs(
> "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}.log" )'
> 11/08/30 15:07:47 INFO collector.CollectorSource: opened
> 11/08/30 15:07:47 INFO agent.LogicalNode: Node config successfully set to
> FlumeConfigData: {srcVer:'Tue Aug 30 15:05:48 MSD 2011' snkVer:'Tue Aug 30
> 15:05:48 MSD 2011' ts='Tue Aug 30 15:05:48 MSD 2011' flowId:'default-flow'
> source:'collectorSource( 35867 )' sink:'collector( 10000 ) {
> escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}.log"
> ) }' }
> 11/08/30 15:07:47 INFO thrift.ThriftEventSource: Starting blocking thread
> pool server on port 35867...
> 11/08/30 15:07:47 INFO rolling.RollSink: opening RollSink 'escapedFormatDfs(
> "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}.log" )'
> 11/08/30 15:07:47 INFO debug.InsistentOpenDecorator: Opened MaskDecorator on
> try 0
> 11/08/30 15:08:23 INFO endtoend.AckChecksumChecker: Starting checksum group
> called 20110830-145747661+0400.678151906732325.00000027
> 11/08/30 15:08:23 INFO endtoend.AckChecksumChecker: initial checksum is
> 1321a567a8e
> 11/08/30 15:08:23 INFO hdfs.EscapedCustomDfsSink: Opening
> hdfs://localhost:8020/tmp/test3/flume-20110830-150817116+0400.678781361271117.00000022.log
> 11/08/30 15:08:25 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:8020. Already tried 0 time(s).
> 11/08/30 15:08:26 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:8020. Already tried 1 time(s).
> 11/08/30 15:08:27 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:8020. Already tried 2 time(s).
> 11/08/30 15:08:28 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:8020. Already tried 3 time(s).
> 11/08/30 15:08:28 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:8020. Already tried 4 time(s).
> 11/08/30 15:08:28 INFO debug.StubbornAppendSink: append Interrupted event
> 'azotov-ws [INFO Tue Aug 30 14:57:47 MSD 2011] { AckChecksum :
> (long)3036875317 (string) '
> 5' (double)1.500415765E-314 } { AckTag :
> 20110830-145747661+0400.678151906732325.00000027 } { AckType : msg }
> 2011-08-15
> 07:01:58\t2006\t153897273691618\t00008C52000000006A6B2F\twww.ancestry.com/\t228\tNULL\t4100\t10162981=50|10163002=50|10163004=50|10162970=50\tNULL\t2\tNULL\tUS\t217\tCOMPUTER\tWINDOWS_7\tCHROME\t-1120233552\t1141025753'
> with error: Blocked append interrupted by rotation event
> 11/08/30 15:08:28 INFO rolling.RollSink: closing RollSink 'escapedFormatDfs(
> "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}.log" )'
> 11/08/30 15:08:28 ERROR connector.DirectDriver: Closing down due to exception
> during append calls
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1223)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:976)
> at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> at
> com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> at
> com.cloudera.flume.handlers.debug.StubbornAppendSink.append(StubbornAppendSink.java:78)
> at
> com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
> at
> com.cloudera.flume.handlers.debug.InsistentAppendDecorator.append(InsistentAppendDecorator.java:110)
> at
> com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
> at
> com.cloudera.flume.handlers.endtoend.AckChecksumChecker.append(AckChecksumChecker.java:172)
> at
> com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
> at
> com.cloudera.flume.handlers.batch.UnbatchingDecorator.append(UnbatchingDecorator.java:62)
> at
> com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
> at
> com.cloudera.flume.handlers.batch.GunzipDecorator.append(GunzipDecorator.java:81)
> at
> com.cloudera.flume.collector.CollectorSink.append(CollectorSink.java:222)
> at
> com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110)
> 11/08/30 15:08:28 INFO connector.DirectDriver: Connector logicalNode
> idea-collector-20 exited with error: null
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1223)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:976)
> at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> at
> com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175)
> at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
> at
> com.cloudera.flume.handlers.debug.StubbornAppendSink.append(StubbornAppendSink.java:78)
> at
> com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
> at
> com.cloudera.flume.handlers.debug.InsistentAppendDecorator.append(InsistentAppendDecorator.java:110)
> at
> com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
> at
> com.cloudera.flume.handlers.endtoend.AckChecksumChecker.append(AckChecksumChecker.java:172)
> at
> com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
> at
> com.cloudera.flume.handlers.batch.UnbatchingDecorator.append(UnbatchingDecorator.java:62)
> at
> com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
> at
> com.cloudera.flume.handlers.batch.GunzipDecorator.append(GunzipDecorator.java:81)
> at
> com.cloudera.flume.collector.CollectorSink.append(CollectorSink.java:222)
> at
> com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110)
> 11/08/30 15:08:28 INFO collector.CollectorSource: closed
> 11/08/30 15:08:29 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:8020. Already tried 5 time(s).
> 11/08/30 15:08:29 INFO thrift.ThriftEventSource: Closed server on port
> 35867...
> 11/08/30 15:08:29 INFO thrift.ThriftEventSource: Queue still has 1000
> elements ...
> 11/08/30 15:08:30 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:8020. Already tried 6 time(s).
> 11/08/30 15:08:31 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:8020. Already tried 7 time(s).
> 11/08/30 15:08:32 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:8020. Already tried 8 time(s).
> 11/08/30 15:08:33 INFO ipc.Client: Retrying connect to server:
> localhost/127.0.0.1:8020. Already tried 9 time(s).
> 11/08/30 15:08:39 WARN thrift.ThriftEventSource: Close timed out due to no
> progress. Closing despite having 1000 values still enqueued
> 11/08/30 15:08:39 INFO rolling.RollSink: closing RollSink 'escapedFormatDfs(
> "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}.log" )'
> 11/08/30 15:08:39 WARN endtoend.AckChecksumChecker: partial acks abandoned:
> {20110830-145747661+0400.678151906732325.00000027=1314701867662}
> 11/08/30 15:08:39 ERROR connector.DirectDriver: Exiting driver logicalNode
> idea-collector-20 in error state CollectorSource | Collector because null
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)