[ 
https://issues.apache.org/jira/browse/HADOOP-19254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903097#comment-17903097
 ] 

ASF GitHub Bot commented on HADOOP-19254:
-----------------------------------------

mukund-thakur commented on code in PR #7197:
URL: https://github.com/apache/hadoop/pull/7197#discussion_r1870172960


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/BulkDeleteCommand.java:
##########
@@ -0,0 +1,84 @@
+package org.apache.hadoop.fs.shell;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.BulkDelete;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.ArrayList;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+public class BulkDeleteCommand extends FsCommand {
+    public static void registerCommands(CommandFactory factory) {
+        factory.addClass(BulkDeleteCommand.class, "-bulkDelete");
+    }
+
+    public static final String name = "bulkDelete";
+
+    public static final String READ_FROM_FILE = "readFromFile";
+
+    public static final String USAGE = "-[ " + READ_FROM_FILE + "] [<file>] 
[<basePath> <paths>]";
+
+    public static final String DESCRIPTION = "Deletes the set of files under 
the given path. If a list of paths " +
+            "is provided then the paths are deleted directly. User can also 
point to the file where the paths are" +
+            "listed as full object names.";
+
+    private String fileName;
+
+    /*
+    Making the class stateful as the PathData initialization for all args is 
not needed
+     */
+    LinkedList<String> childArgs;
+
+    protected BulkDeleteCommand() {
+        this.childArgs = new LinkedList<>();
+    }
+
+    protected BulkDeleteCommand(Configuration conf) {super(conf);}
+
+    @Override
+    protected void processOptions(LinkedList<String> args) throws IOException {
+        CommandFormat cf = new CommandFormat(0, Integer.MAX_VALUE);
+        cf.addOptionWithValue(READ_FROM_FILE);
+        cf.parse(args);
+        fileName = cf.getOptValue(READ_FROM_FILE);
+    }
+
+    @Override
+    protected LinkedList<PathData> expandArguments(LinkedList<String> args) 
throws IOException {
+        if(fileName == null && args.size() < 2) {
+            throw new IOException("Invalid Number of Arguments. Expected 
more");
+        }
+        LinkedList<PathData> pathData = new LinkedList<>();
+        pathData.add(new PathData(args.get(0), getConf()));
+        args.remove(0);
+        this.childArgs = args;
+        return pathData;
+    }
+
+    @Override
+    protected void processArguments(LinkedList<PathData> args) throws 
IOException {
+        PathData basePath = args.get(0);
+        out.println("Deleting files under:" + basePath);
+        List<Path> pathList = new ArrayList<>();
+        if(fileName != null) {
+            FileSystem localFile = FileSystem.get(getConf());
+            BufferedReader br = new BufferedReader(new 
InputStreamReader(localFile.open(new Path(fileName))));
+            String line;
+            while((line = br.readLine()) != null) {
+                pathList.add(new Path(line));
+            }
+        } else {
+            
pathList.addAll(this.childArgs.stream().map(Path::new).collect(Collectors.toList()));
+        }
+        BulkDelete bulkDelete = basePath.fs.createBulkDelete(basePath.path);
+        bulkDelete.bulkDelete(pathList);

Review Comment:
   If you see the API spec, bulkDelete fails if the number of paths is more 
than pageSize. So it's the caller's responsibility to paginate and make 
multiple bulk delete calls.
   See our current delete impl in S3A.  
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L3484
 
   btw @steveloughran  I think we should migrate the old implementation to use 
the new BulkDelete interface here. 
   





> Implement bulk delete command as hadoop fs command operation 
> -------------------------------------------------------------
>
>                 Key: HADOOP-19254
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19254
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 3.4.1
>            Reporter: Mukund Thakur
>            Assignee: Harshit Gupta
>            Priority: Major
>              Labels: pull-request-available
>
> {code}
> hadoop fs -bulkdelete <base-url> <file> 
> {code}
> Key uses
> * QE: Testing from python and other scripting languages
> * cluster maintenance: actual bulk deletion operations from the store
> one thought there: we MUST qualify paths with / elements: if a passed in path 
> ends in /, it means "delete a marker", not "delete a dir"'. and if it doesn't 
> have one then it's an object.. This makes it possible to be used to delete 
> surplus markers or where there is a file above another file...cloudstore 
> listobjects finds this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to