[
https://issues.apache.org/jira/browse/FLUME-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496540#comment-13496540
]
Mike Percy commented on FLUME-1702:
-----------------------------------
Whoops, didn't notice you filed this JIRA Brock. Adding description from dup
ticket:
We should add the capability to the HDFS sink to specify a prefix for the .tmp
files. I believe this needs to be configurable and disabled by default.
However we should document that we recommend "_" or "." as a prefix for the
temp files.
This is because Hadoop's default FileInputFormat will skip files beginning with
"_" or "." (hidden files)
> HDFSEventSink should write to a hidden file as opposed to a .tmp file
> ---------------------------------------------------------------------
>
> Key: FLUME-1702
> URL: https://issues.apache.org/jira/browse/FLUME-1702
> Project: Flume
> Issue Type: Improvement
> Reporter: Brock Noland
>
> Currently we write to a .tmp file. The problem is that if MR jobs are being
> run on the directory we are writing to, then it's common for an MR job to
> list the directory, get a .tmp file and then in the mean time the .tmp file
> is renamed causing the job to fail when run.
> Using JavaMR you can use a PathFilter to avoid this, however a custom
> solution is required for Pig, Hive, etc.
> Perhaps we should write to a hidden file so that MR never tries to process
> data in flight.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira