vburenin commented on a change in pull request #4787:
URL: https://github.com/apache/hudi/pull/4787#discussion_r807478333



##########
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -555,6 +555,10 @@ public void refreshTimeline() throws IOException {
       case INSERT_OVERWRITE_TABLE:
         writeStatusRDD = writeClient.insertOverwriteTable(records, 
instantTime).getWriteStatuses();
         break;
+      case DELETE_PARTITION:
+        List<String> partitions = records.map(record -> 
record.getPartitionPath()).distinct().collect();

Review comment:
       Would be nice if it was a delta streamer CLI parameter, something that 
gets executed at the end of the ingestion. hudi-cli may work if it can be done 
as a single CLI command. For example:
   ```
   hudi-cli delete-partitions --schema-name someschema --table-name sometable 
--location s3a://bucket/data --hive-server xxxxx --metastore=xxx 
--do-not-delete-data
   ```
   --do-not-delete data maybe helpful to delete data faster. I use go tool that 
spins up hundreds of goroutines to delete hundreds of thousands of files data 
within seconds.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to