[jira] [Commented] (FLUME-2055) Flume leaves .tmp files in HDFS (unclosed?) after NameNode goes down

Mark Reibert (JIRA) Mon, 03 Mar 2014 14:34:43 -0800

    [ 
https://issues.apache.org/jira/browse/FLUME-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918656#comment-13918656
 ]


Mark Reibert commented on FLUME-2055:
-------------------------------------

I am a little confused how FLUME-2007 fixes this issue. If the namenode goes 
down, and stays down for longer than the close retry period, then what?

Ultimately Flume is a bit at the mercy of HDFS here; after all, if the HDFS rug 
gets pulled out from under Flume, what can (should) it do? The more important 
questions are, what does the user do with the .tmp files? Will Flume re-send 
all the data in the .tmp file? Do they contain valid records? How do we 
determine if there are partial records that need to be discarded?

Both Flume and HDFS are going to have problems; that is just the nature of 
life. So as a user I am going to see .tmp files, and it is difficult to know 
just what I am supposed to do with them.

> Flume leaves .tmp files in HDFS (unclosed?) after NameNode goes down
> --------------------------------------------------------------------
>
>                 Key: FLUME-2055
>                 URL: https://issues.apache.org/jira/browse/FLUME-2055
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.3.0
>            Reporter: Hari Sekhon
>            Assignee: Ted Malaska
>
> NameNode was restarted while Flume was still running, resulting in .tmp files 
> left in HDFS that weren't cleaned up that subsequently broke MapReduce with 
> an error that implied the file wasn't closed properly:
> ERROR org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: 
> Cannot obtain block length for LocatedBlock{BP-1974494376-X.X.X.X
> Here is the Flume exception:
> ERROR [pool-9-thread-2] (org.apache.flume.source.AvroSource.appendBatch:302) 
> - Avro source avro_source: Unable to process event batch. Exception follows. 
> org.apache.flume.ChannelException: Unable to put batch on required channel: 
> org.apache.flume.channel.MemoryChannel{name: hdfs_channel} 
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:200)
>  
> at org.apache.flume.source.AvroSource.appendBatch(AvroSource.java:300) 
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at 
> org.apache.avro.ipc.specific.SpecificResponder.respond(SpecificResponder.java:88)
>  
> at org.apache.avro.ipc.Responder.respond(Responder.java:149) 
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.messageReceived(NettyServer.java:188)
>  
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
>  
> at 
> org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:173)
>  
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>  
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
>  
> at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) 
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:321)
>  
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:303)
>  
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:220)
>  
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
>  
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>  
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>  
> at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) 
> at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) 
> at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94) 
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:364)
>  
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:238)
>  
> at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38) 
> at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  
> at java.lang.Thread.run(Thread.java:662) 
> Caused by: org.apache.flume.ChannelException: Space for commit to queue 
> couldn't be acquired Sinks are likely not keeping up with sources, or the 
> buffer size is too tight 
> at 
> org.apache.flume.channel.MemoryChannel$MemoryTransaction.doCommit(MemoryChannel.java:128)
>  
> at 
> org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
>  
> at 
> org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192)
>  
> ... 28 more 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (FLUME-2055) Flume leaves .tmp files in HDFS (unclosed?) after NameNode goes down

Reply via email to