Ricky Saltzer created NIFI-2547:
-----------------------------------

             Summary: Add DeleteHDFS Processor 
                 Key: NIFI-2547
                 URL: https://issues.apache.org/jira/browse/NIFI-2547
             Project: Apache NiFi
          Issue Type: New Feature
            Reporter: Ricky Saltzer
            Assignee: Ricky Saltzer


There are times where a user may want to remove a file or directory from HDFS. 
The reasons for this vary, but to provide some context, I currently have a 
pipeline where I need to periodically delete files that my NiFi pipeline is 
producing. In my case, it's a "Delete files after they are 7 days old". 

Currently, I have to use the {{ExecuteStreamCommand}} processor and manually 
call {{hdfs dfs -rm}}, which is awful when dealing with a large amount of 
files. For one, an entire JVM is spun up for each delete, and two, when 
deleting directories with thousands of files, it can sometimes cause the 
command to hang indefinitely. 

With that being said, I am proposing we add a {{DeleteHDFS}} processor which 
meets the following criteria. 

* Can delete both directories and files
* Can delete directories recursively
* Supports the dynamic expression language 
* Supports using glob paths (e.g. /data/for/2017/08/*)
* Capable of being a downstream processor as well as a standalone processor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to