[
https://issues.apache.org/jira/browse/FLUME-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918656#comment-13918656
]
Mark Reibert commented on FLUME-2055:
-------------------------------------
I am a little confused how FLUME-2007 fixes this issue. If the namenode goes
down, and stays down for longer than the close retry period, then what?
Ultimately Flume is a bit at the mercy of HDFS here; after all, if the HDFS rug
gets pulled out from under Flume, what can (should) it do? The more important
questions are, what does the user do with the .tmp files? Will Flume re-send
all the data in the .tmp file? Do they contain valid records? How do we
determine if there are partial records that need to be discarded?
Both Flume and HDFS are going to have problems; that is just the nature of
life. So as a user I am going to see .tmp files, and it is difficult to know
just what I am supposed to do with them.
> Flume leaves .tmp files in HDFS (unclosed?) after NameNode goes down
> --------------------------------------------------------------------
>
> Key: FLUME-2055
> URL: https://issues.apache.org/jira/browse/FLUME-2055
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.3.0
> Reporter: Hari Sekhon
> Assignee: Ted Malaska
>
> NameNode was restarted while Flume was still running, resulting in .tmp files
> left in HDFS that weren't cleaned up that subsequently broke MapReduce with
> an error that implied the file wasn't closed properly:
> ERROR org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException:
> Cannot obtain block length for LocatedBlock{BP-1974494376-X.X.X.X
> Here is the Flume exception:
> ERROR [pool-9-thread-2] (org.apache.flume.source.AvroSource.appendBatch:302)
> - Avro source avro_source: Unable to process event batch. Exception follows.
> org.apache.flume.ChannelException: Unable to put batch on required channel:
> org.apache.flume.channel.MemoryChannel{name: hdfs_channel}
> at
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:200)
>
> at org.apache.flume.source.AvroSource.appendBatch(AvroSource.java:300)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:88)
>
> at org.apache.avro.ipc.Responder.respond(Responder.java:149)
> at
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
>
> at
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
>
> at
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
>
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>
> at
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
>
> at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321)
>
> at
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303)
>
> at
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:220)
>
> at
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
>
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>
> at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:364)
>
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:238)
>
> at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)
> at
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.flume.ChannelException: Space for commit to queue
> couldn't be acquired Sinks are likely not keeping up with sources, or the
> buffer size is too tight
> at
> org.apache.flume.channel.MemoryChannel$MemoryTransaction.doCommit(MemoryChannel.java:128)
>
> at
> org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
>
> at
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192)
>
> ... 28 more
--
This message was sent by Atlassian JIRA
(v6.2#6252)