Ricky Saltzer created FLUME-1486:
------------------------------------

             Summary: Ability to configure a staging directory for data 
                 Key: FLUME-1486
                 URL: https://issues.apache.org/jira/browse/FLUME-1486
             Project: Flume
          Issue Type: Improvement
          Components: Sinks+Sources
            Reporter: Ricky Saltzer


It would be nice to be able to configure a staging directory for files being 
written to HDFS. Once the file stream is complete the file would then be moved 
to the configured "final" directory. 

One example use case where this helps is with log files which are being 
analyzed by Hive. We could have a Hive table that points to HDFS folder which 
contains a bunch of log files. As it stands, if flume is writing a tmp file 
into that directory, and you fire up a MapReduce job, and that file is finished 
being written to (thus changing the filename) than the job will fail because it 
can't find that job. 

The current workaround is to use virtual columns to not look at TMP files, but 
this tedious to do for every query. It would be nice to be able to have a 
directory Flume can write the files into, once it finishes streaming data to a 
job and closes the file for writing, it can move it to the final directory. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to