vburenin commented on a change in pull request #4787:
URL: https://github.com/apache/hudi/pull/4787#discussion_r807478333
##########
File path:
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -555,6 +555,10 @@ public void refreshTimeline() throws IOException {
case INSERT_OVERWRITE_TABLE:
writeStatusRDD = writeClient.insertOverwriteTable(records,
instantTime).getWriteStatuses();
break;
+ case DELETE_PARTITION:
+ List<String> partitions = records.map(record ->
record.getPartitionPath()).distinct().collect();
Review comment:
Would be nice if it was a delta streamer CLI parameter, something that
gets executed at the end of the ingestion. hudi-cli may work if it can be done
as a single CLI command. For example:
```
hudi-cli delete-partitions --schema-name someschema --table-name sometable
--location s3a://bucket/data --hive-server xxxxx --metastore=xxx
--do-not-delete-data
```
--do-not-delete data maybe helpful to delete data faster. I use go tool that
spins up hundreds of goroutines to delete hundreds of thousands of files data
within seconds.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]