[ 
https://issues.apache.org/jira/browse/FLUME-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238779#comment-14238779
 ] 

Jeff Parsons commented on FLUME-2228:
-------------------------------------

I actually ran into a similar issue with Flume being unable to recover after a 
simple restart of the Hadoop Name Node. Two threads on different servers throw 
errors for a long time after the NN had come up and SafeMode was off. This is 
with Flume 1.4.0-cdh5.0.1.

The NN recovered well over two hours ago and this thread continues to try but 
doesn't seem to be getting the update on the SafeMode status. In the logs we do 
see other threads writing just fine...
{noformat}
09 Dec 2014 00:57:11,801 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.append:477)  - Caught IOException 
writing to HDFSWriter (Cannot add block to 
/events/testdata/2014-12-08/FlumeData.1418080656419.tmp. Name node is in safe 
mode.
The reported blocks 6172466 needs additional 8129432 blocks to reach the 
threshold 0.9990 of total blocks 14316214.
Safe mode will be turned off automatically
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1197)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2751)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2666)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
{noformat}

On another node, it was stuck with this error for about two hours and finally 
stopped reporting it recently
{noformat}
08 Dec 2014 23:39:17,406 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.append:483)  - Caught IOException 
while closing file 
(hdfs://my-namenode:8020/events/testdata2/2014-12-08/FlumeData.1418078145239.tmp).
 Exception follows.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
 Cannot add block to /events/testdata2/2014-12-08/FlumeData.1418078145239.tmp. 
Name node is in safe mode.
The reported blocks 14316214 has reached the threshold 0.9990 of total blocks 
14316214. The number of live datanodes 17 has reached the minimum number 0. 
Safe mode will be turned off automatically in 4 seconds.

        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1197)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2751)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2666)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

        at org.apache.hadoop.ipc.Client.call(Client.java:1409)
        at org.apache.hadoop.ipc.Client.call(Client.java:1362)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1437)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
{noformat}


> HDFS namenode failover cause flume error
> ----------------------------------------
>
>                 Key: FLUME-2228
>                 URL: https://issues.apache.org/jira/browse/FLUME-2228
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.3.0
>         Environment: redhat6.2,flume 1.3.0
>            Reporter: alex kim
>            Priority: Critical
>
> i test flume in HDFS namenode HA env ,
> here is my config for flume
> agent.sources = seqGenSrc
> agent.channels = memoryChannel
> agent.sinks = HDFSSink
> agent.sources.seqGenSrc.type = exec
> agent.sources.seqGenSrc.command = tail -f /tmp/mytest
> agent.sources.seqGenSrc.channels = memoryChannel
> agent.sinks.HDFSSink.type = hdfs
> agent.sinks.HDFSSink.hdfs.path = /myflume
> agent.sinks.HDFSSink.channel = memoryChannel
> agent.channels.memoryChannel.type = memory
> agent.channels.memoryChannel.capacity = 100
> i run this in commandline,write one record each 0.5 second
> # for i in `seq 1000`;do echo yes$i >> /tmp/mytest;sleep .5;done
> when write the file ,i restart active NN,to trigger NN autofailover,the 
> active become standby ,and standby become active
> and here is my flume log output
> 30 Oct 2013 13:55:13,192 WARN  
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:418)  - HDFS IO error
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
>  Cannot add block to /myflume/FlumeData.1383111641112.tmp. Name node is in 
> safe m
> ode.
> The reported blocks 3722 has reached the threshold 0.9990 of total blocks 
> 3722. Safe mode will be turned off automatically in 4 seconds.
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2254)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:480)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1225)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:291)
>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:601)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1176)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1029)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487)
> 30 Oct 2013 13:55:18,194 ERROR 
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated:82)  - 
> Unexpected error while che
> cking replication factor
> java.lang.reflect.InvocationTargetException
>         at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:601)
>         at 
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:147)
>         at 
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:68)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:454)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:389)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>         at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
>  Cannot add block to /myflume/FlumeData.1383111641112.tmp. Name node is in 
> safe mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to