Hi, I agree, it does not appear to work that way today. It looks like there is already a JIRA for this https://issues.apache.org/jira/browse/FLUME-1350
If you have any ideas or patches, please update that JIRA! Brock On Tue, Jul 31, 2012 at 1:37 PM, Yongcheng Li <[email protected]> wrote: > Does anyone have comment on using time (such as day/hour) as part of the > file name? When it crosses the boundary of the defined time period, Flume > creates a new file. What is the expected way of handling the old file (it > does not meet any of the roll over condition yet)? I would expect Flume to > flush data out to disk, close that file and remove the .tmp suffix. Am I > right? It does not behave in this manner right now.**** > > ** ** > > Regards,**** > > ** ** > > Yongcheng**** > > ** ** > > *From:* Gumnaam Sur [mailto:[email protected]] > *Sent:* Tuesday, July 31, 2012 2:04 PM > *To:* [email protected] > *Subject:* Re: Flume 1.2.0 HDFS Sink Output File Question**** > > ** ** > > Is there a documented way of shutting down flume ?**** > > I just do kill -s TERM <pid> , and I do see flume shutting down normally.* > *** > > But not all HDFS sink files are closed at times, even with a proper > shutdown.**** > > e.g. I was testing a setup with 5 HDFS sinks, and only the last one > defined in the conf file was**** > > being renamed to remove '.tmp' the other four still had '.tmp' extension.* > *** > > On Tue, Jul 31, 2012 at 1:52 PM, Denny Ye <[email protected]> wrote:**** > > hi Yongcheng, **** > > Flume doesn't recheck the destination in last Agent lifecycle. The > last temporary file is not be reused in current process. Possible reason of > this case might be : 1. Did that temporary file was closed normally? If > not, Flume should close that file with appropriate way like 'recoverLease' > interface. 2. Does that file name can be reuse in latest path pattern?*** > * > > **** > > No matter which case, we hope that there is unified activity in path > pattern. Just like your mention, I agree with you. Need some other guys to > discuss may be.**** > > ** ** > > -Regards**** > > Denny Ye**** > > ** ** > > 2012/7/31 Yongcheng Li <[email protected]>**** > > Hi,**** > > **** > > I am using Flume 1.2.0 HDFS sink. When Flume crashes (being killed), a > file name with a suffix of .tmp is generated. I believe it contains the > data that were flushed into disk when the crash happens. But why does it > have a .tmp suffix? Shouldn’t Flume just write it into a regular file > (without .tmp)?**** > > **** > > I am using month/day/hour as part of my HDFS file name (%m_%d_%H). When > the hour passes, it still has a file like 07_31_09.events.1343742385766.tmp > with a size of zero. Shouldn’t Flume just close that file and remove the > .tmp suffix? When I kill Flume, I can see data written into this file but > still with a .tmp suffix.**** > > **** > > Thanks!**** > > **** > > Yongcheng**** > > ** ** > > ** ** > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
