[ 
https://issues.apache.org/jira/browse/FLUME-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082368#comment-13082368
 ] 

Jonathan Hsieh commented on FLUME-734:
--------------------------------------

The root cause of this is the fact tha the outputformat in the escapedFormatDfs 
gets reused per file it attempts to write, and that the seqfile output format 
assumes that it is only being used by a single file handle.  I've been able to 
reliably duplicate this problem with a test case, working on patch now.

> escapedFormatDfs goes into a file creation frenzy
> -------------------------------------------------
>
>                 Key: FLUME-734
>                 URL: https://issues.apache.org/jira/browse/FLUME-734
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4
>         Environment: CentOS 5.6
>            Reporter: Eran Kutner
>            Priority: Critical
>         Attachments: flume.log
>
>
> Using this configuration:
> collectorSource(54001) | collector(600000) { 
> escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", 
> "events-%{rolltag}-col1.snappy", seqfile("SnappyCodec")) }
> The expected behavior is to see a new file created every 10 minutes. However, 
> once in a while the collector would go into a file creation frenzy, creating 
> new files every second.
> The log indicates that writing has failed with error: "OutputFormat instance 
> can only write to the same OutputStream" causing the file to be closed a new 
> one to be opened just to be closed again.
> Looking at the code I'm not even sure how the output stream could change but 
> the behavior I'm seeing feels like some sort of a race condition. It is 
> happening much more under heavy load than under low load.
> See attached log excerpt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to