Ricky Saltzer created NIFI-2547:
-----------------------------------
Summary: Add DeleteHDFS Processor
Key: NIFI-2547
URL: https://issues.apache.org/jira/browse/NIFI-2547
Project: Apache NiFi
Issue Type: New Feature
Reporter: Ricky Saltzer
Assignee: Ricky Saltzer
There are times where a user may want to remove a file or directory from HDFS.
The reasons for this vary, but to provide some context, I currently have a
pipeline where I need to periodically delete files that my NiFi pipeline is
producing. In my case, it's a "Delete files after they are 7 days old".
Currently, I have to use the {{ExecuteStreamCommand}} processor and manually
call {{hdfs dfs -rm}}, which is awful when dealing with a large amount of
files. For one, an entire JVM is spun up for each delete, and two, when
deleting directories with thousands of files, it can sometimes cause the
command to hang indefinitely.
With that being said, I am proposing we add a {{DeleteHDFS}} processor which
meets the following criteria.
* Can delete both directories and files
* Can delete directories recursively
* Supports the dynamic expression language
* Supports using glob paths (e.g. /data/for/2017/08/*)
* Capable of being a downstream processor as well as a standalone processor
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)