Ricky Saltzer created FLUME-1486:
------------------------------------
Summary: Ability to configure a staging directory for data
Key: FLUME-1486
URL: https://issues.apache.org/jira/browse/FLUME-1486
Project: Flume
Issue Type: Improvement
Components: Sinks+Sources
Reporter: Ricky Saltzer
It would be nice to be able to configure a staging directory for files being
written to HDFS. Once the file stream is complete the file would then be moved
to the configured "final" directory.
One example use case where this helps is with log files which are being
analyzed by Hive. We could have a Hive table that points to HDFS folder which
contains a bunch of log files. As it stands, if flume is writing a tmp file
into that directory, and you fire up a MapReduce job, and that file is finished
being written to (thus changing the filename) than the job will fail because it
can't find that job.
The current workaround is to use virtual columns to not look at TMP files, but
this tedious to do for every query. It would be nice to be able to have a
directory Flume can write the files into, once it finishes streaming data to a
job and closes the file for writing, it can move it to the final directory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira